管道(软件)
机器学习
试验装置
化学空间
适用范围
训练集
人工智能
化学
数据集
药品
集合(抽象数据类型)
药物发现
设计质量
实验数据
计算机科学
数量结构-活动关系
药理学
统计
医学
物理化学
生物化学
粒径
程序设计语言
数学
作者
Xiaoyu Ding,Rongrong Cui,Jie Yu,Tiantian Liu,Tingfei Zhu,Dingyan Wang,J. Morris Chang,Zisheng Fan,Xiaomeng Liu,Kaixian Chen,Hualiang Jiang,Xutong Li,Xiaomin Luo,Mingyue Zheng
标识
DOI:10.1021/acs.jmedchem.1c01683
摘要
The success of artificial intelligence (AI) models has been limited by the requirement of large amounts of high-quality training data, which is just the opposite of the situation in most drug discovery pipelines. Active learning (AL) is a subfield of AI that focuses on algorithms that select the data they need to improve their models. Here, we propose a two-phase AL pipeline and apply it to the prediction of drug oral plasma exposure. In phase I, the AL-based model demonstrated a remarkable capability to sample informative data from a noisy data set, which used only 30% of the training data to yield a prediction capability with an accuracy of 0.856 on an independent test set. In phase II, the AL-based model explored a large diverse chemical space (855K samples) for experimental testing and feedback. Improved accuracy and new highly confident predictions (50K samples) were observed, which suggest that the model's applicability domain has been significantly expanded.
科研通智能强力驱动
Strongly Powered by AbleSci AI