Applying Machine Learning to Ultrafast Shape Recognition in Ligand-Based Virtual Screening

公制(单位) 计算机科学 虚拟筛选 相似性(几何) 混合模型 人工智能 模式识别(心理学) 机器学习 人工神经网络 高斯分布 药物发现 图像(数学) 物理 生物信息学 量子力学 生物 经济 运营管理
作者
Etienne Bonanno,Jean-Paul Ebejer
出处
期刊:Frontiers in Pharmacology [Frontiers Media SA]
卷期号:10 被引量:14
标识
DOI:10.3389/fphar.2019.01675
摘要

Ultrafast Shape Recognition (USR), along with its derivatives, are Ligand-Based Virtual Screening (LBVS) methods that condense 3-dimensional information about molecular shape, as well as other properties, into a small set of numeric descriptors. These can be used to efficiently compute a measure of similarity between pairs of molecules using a simple inverse Manhattan Distance metric. In this study we explore the use of suitable Machine Learning techniques that can be trained using USR descriptors, so as to improve the similarity detection of potential new leads. We use molecules from the Directory for Useful Decoys-Enhanced to construct machine learning models based on three different algorithms: Gaussian Mixture Models (GMMs), Isolation Forests, and Artificial Neural Networks (ANNs). We train models based on full molecule conformer models, as well as the Lowest Energy Conformations (LECs) only. We also investigate the performance of our models when trained on smaller datasets so as to model virtual screening scenarios when only a small number of actives are known a priori. Our results indicate significant performance gains over a state of the art USR-derived method, ElectroShape-5D (ES5D), with GMMs obtaining a mean performance up to 430% better than that of ES5D in terms of Enrichment Factor with a maximum improvement of up to 940%. Additionally, we demonstrate that our models are capable of maintaining their performance, in terms of enrichment factor, within 10% of the mean as the size of the training dataset is successively reduced. Furthermore, we also demonstrate that running times for retrospective screening using the machine learning models we selected are faster than standard USR, on average by a factor of 10, including the time required for training. Our results show that machine learning techniques can significantly improve the virtual screening performance and efficiency of the USR family of methods.

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
拨云见日完成签到,获得积分10
5秒前
王瑞华发布了新的文献求助10
6秒前
7秒前
sss完成签到 ,获得积分10
8秒前
11秒前
SUN完成签到 ,获得积分10
13秒前
Orange应助风吹玲响采纳,获得20
14秒前
14秒前
15秒前
16秒前
missr完成签到,获得积分10
16秒前
懒惰饼子发布了新的文献求助10
18秒前
英俊的铭应助王瑞华采纳,获得10
18秒前
Mortimer发布了新的文献求助50
19秒前
missr发布了新的文献求助10
20秒前
敏感丹翠发布了新的文献求助10
23秒前
咔咔完成签到,获得积分20
24秒前
SuperFAN完成签到,获得积分10
26秒前
27秒前
31秒前
在水一方应助机灵垣采纳,获得10
31秒前
32秒前
少山完成签到,获得积分10
35秒前
花生壳发布了新的文献求助10
36秒前
jh发布了新的文献求助10
39秒前
42秒前
丘比特应助老柳采纳,获得10
42秒前
43秒前
zxldylan完成签到,获得积分10
50秒前
王瑞华发布了新的文献求助10
50秒前
50秒前
51秒前
xicifish完成签到,获得积分10
51秒前
51秒前
52秒前
54秒前
54秒前
时尚千万发布了新的文献求助10
54秒前
拾一完成签到 ,获得积分10
55秒前
小蘑菇应助少山采纳,获得10
55秒前
高分求助中
Manual of Clinical Microbiology, 4 Volume Set (ASM Books) 13th Edition 1000
Teaching Social and Emotional Learning in Physical Education 900
The three stars each : the Astrolabes and related texts 550
Boris Pesce - Gli impiegati della Fiat dal 1955 al 1999 un percorso nella memoria 500
Chinese-English Translation Lexicon Version 3.0 500
Recherches Ethnographiques sue les Yao dans la Chine du Sud 500
[Lambert-Eaton syndrome without calcium channel autoantibodies] 460
热门求助领域 (近24小时)
化学 材料科学 医学 生物 有机化学 工程类 生物化学 纳米技术 物理 内科学 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 电极 光电子学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 2399725
求助须知:如何正确求助?哪些是违规求助? 2100481
关于积分的说明 5295487
捐赠科研通 1828213
什么是DOI,文献DOI怎么找? 911229
版权声明 560142
科研通“疑难数据库(出版商)”最低求助积分说明 487075