化学
机器学习
对接(动物)
功能(生物学)
人工智能
蛋白质配体
集合(抽象数据类型)
计算机科学
训练集
化学
生物信息学
药物发现
生物
生物化学
程序设计语言
护理部
进化生物学
医学
作者
Fergus Boyles,Charlotte M. Deane,Garrett M. Morris
标识
DOI:10.1021/acs.jcim.1c00096
摘要
Machine learning scoring functions for protein–ligand binding affinity have been found to consistently outperform classical scoring functions when trained and tested on crystal structures of bound protein–ligand complexes. However, it is less clear how these methods perform when applied to docked poses of complexes. We explore how the use of docked rather than crystallographic poses for both training and testing affects the performance of machine learning scoring functions. Using the PDBbind Core Sets as benchmarks, we show that the performance of a structure-based machine learning scoring function trained and tested on docked poses is lower than that of the same scoring function trained and tested on crystallographic poses. We construct a hybrid scoring function by combining both structure-based and ligand-based features, and show that its ability to predict binding affinity using docked poses is comparable to that of purely structure-based scoring functions trained and tested on crystal poses. We also present a new, freely available validation set─the Updated DUD-E Diverse Subset─for binding affinity prediction using data from DUD-E and ChEMBL. Despite strong performance on docked poses of the PDBbind Core Sets, we find that our hybrid scoring function sometimes generalizes poorly to a protein target not represented in the training set, demonstrating the need for improved scoring functions and additional validation benchmarks.
科研通智能强力驱动
Strongly Powered by AbleSci AI