药物发现
配体(生物化学)
计算机科学
国家(计算机科学)
人工智能
计算生物学
机器学习
化学
生物
算法
生物化学
受体
作者
The-Chuong Trinh,Pierre Falson,Viet‐Khoa Tran‐Nguyen,Ahcène Boumendjel
标识
DOI:10.1021/acs.jcim.5c00374
摘要
Artificial intelligence (AI) is revolutionizing drug discovery with unprecedented speed and efficiency. In computer-aided drug design, structure-based and ligand-based methodologies are the main driving forces for innovation. In cases where no experimental structure or high-confidence homology/AlphaFold-predicted model of the target is available in 3D, ligand-based strategies are generally preferable. Here, we aim to develop and evaluate new predictive AI models for ligand-based drug discovery. To illustrate our workflow, we propose, as an example, an ensemble classification model for Cdr1 inhibitor prediction. We leverage target-specific experimental data from different sources, various molecular feature types, and multiple state-of-the-art machine learning (ML) algorithms alongside a multi-instance 3D graph neural network (multiple conformations of a single molecule are considered). Bayesian hyperparameter tuning, stacked generalization, and soft voting are involved in our workflow. The final target-specific ensemble model benefits from the classification and screening power of those constituting it. On an external test set structurally dissimilar to the training data, its average precision is 0.755, its F1-score is 0.714, the area under the receiver operating characteristic curve is 0.884, and the balanced accuracy is 0.799. It gives a low false positive rate of 0.1236 on another test set outside the training chemical space, indicating its ability to avoid false positives. The present work highlights the potential of stacking ensemble ML and offers a rigorous general workflow to build ligand-based predictive AI models for other targets.
科研通智能强力驱动
Strongly Powered by AbleSci AI