Comparison of Deep Learning With Multiple Machine Learning Methods and Metrics Using Diverse Drug Discovery Data Sets

人工智能 机器学习 深度学习 计算机科学 药物发现 人工神经网络 广告 支持向量机 化学空间 生物信息学 生物 药代动力学
作者
Alexandru Korotcov,Valery Tkachenko,Daniel P. Russo,Sean Ekins
出处
期刊:Molecular Pharmaceutics [American Chemical Society]
卷期号:14 (12): 4462-4475 被引量:308
标识
DOI:10.1021/acs.molpharmaceut.7b00578
摘要

Machine learning methods have been applied to many data sets in pharmaceutical research for several decades. The relative ease and availability of fingerprint type molecular descriptors paired with Bayesian methods resulted in the widespread use of this approach for a diverse array of end points relevant to drug discovery. Deep learning is the latest machine learning algorithm attracting attention for many of pharmaceutical applications from docking to virtual screening. Deep learning is based on an artificial neural network with multiple hidden layers and has found considerable traction for many artificial intelligence applications. We have previously suggested the need for a comparison of different machine learning methods with deep learning across an array of varying data sets that is applicable to pharmaceutical research. End points relevant to pharmaceutical research include absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) properties, as well as activity against pathogens and drug discovery data sets. In this study, we have used data sets for solubility, probe-likeness, hERG, KCNQ1, bubonic plague, Chagas, tuberculosis, and malaria to compare different machine learning methods using FCFP6 fingerprints. These data sets represent whole cell screens, individual proteins, physicochemical properties as well as a data set with a complex end point. Our aim was to assess whether deep learning offered any improvement in testing when assessed using an array of metrics including AUC, F1 score, Cohen's kappa, Matthews correlation coefficient and others. Based on ranked normalized scores for the metrics or data sets Deep Neural Networks (DNN) ranked higher than SVM, which in turn was ranked higher than all the other machine learning methods. Visualizing these properties for training and test sets using radar type plots indicates when models are inferior or perhaps over trained. These results also suggest the need for assessing deep learning further using multiple metrics with much larger scale comparisons, prospective testing as well as assessment of different fingerprints and DNN architectures beyond those used.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
山沟沟完成签到,获得积分10
1秒前
Akim应助wly1111采纳,获得10
1秒前
4秒前
成诗怡发布了新的文献求助10
4秒前
哈哈哈完成签到 ,获得积分10
7秒前
B养老崔完成签到 ,获得积分10
8秒前
QJL完成签到,获得积分10
10秒前
11秒前
迷路中的骑手完成签到,获得积分10
11秒前
可爱的函函应助ccc采纳,获得10
12秒前
13秒前
理想完成签到,获得积分10
13秒前
focco发布了新的文献求助10
13秒前
段段发布了新的文献求助20
14秒前
夏夜完成签到 ,获得积分10
15秒前
zxd发布了新的文献求助10
16秒前
含糊的画板完成签到,获得积分10
17秒前
ycp完成签到,获得积分10
19秒前
体能行者完成签到,获得积分10
19秒前
流川封完成签到,获得积分10
20秒前
余周周完成签到 ,获得积分10
20秒前
小蘑菇应助zxd采纳,获得30
22秒前
科研通AI5应助yidingshangan采纳,获得10
24秒前
24秒前
成诗怡完成签到,获得积分10
26秒前
共享精神应助rose123456采纳,获得10
26秒前
27秒前
28秒前
ary完成签到,获得积分10
30秒前
所所应助Lea采纳,获得10
31秒前
思源应助是小明啦采纳,获得10
31秒前
zhangyx发布了新的文献求助30
31秒前
百地希留耶完成签到 ,获得积分10
33秒前
Znn发布了新的文献求助10
33秒前
34秒前
35秒前
背后的梦凡完成签到,获得积分10
35秒前
Vicky完成签到 ,获得积分20
35秒前
自信夜春发布了新的文献求助10
38秒前
Znn完成签到,获得积分10
39秒前
高分求助中
【此为提示信息,请勿应助】请按要求发布求助,避免被关 20000
Continuum Thermodynamics and Material Modelling 2000
Encyclopedia of Geology (2nd Edition) 2000
105th Edition CRC Handbook of Chemistry and Physics 1600
Maneuvering of a Damaged Navy Combatant 650
Mixing the elements of mass customisation 300
the MD Anderson Surgical Oncology Manual, Seventh Edition 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3778270
求助须知:如何正确求助?哪些是违规求助? 3323870
关于积分的说明 10216436
捐赠科研通 3039122
什么是DOI,文献DOI怎么找? 1667788
邀请新用户注册赠送积分活动 798409
科研通“疑难数据库(出版商)”最低求助积分说明 758366