生物信息学
间隙
训练集
体内
计算机科学
计算生物学
化学
人工智能
医学
生物
泌尿科
生物化学
基因
生物技术
作者
Franco Lombardo,Jörg Bentzien,Giuliano Berellini,Ingo Muegge
标识
DOI:10.1021/acs.molpharmaceut.3c00812
摘要
Predicting human clearance with high accuracy from in silico-derived parameters alone is highly desirable, as it is fast, saves in vitro resources, and is animal-sparing. We derived random forest (RF) models from 1340 compounds with human intravenous pharmacokinetic (PK) data, the largest data set publicly available today. To assess the general applicability of the RF models, we systematically removed structural-therapeutic class analogues and other compounds with structural similarity from the training sets. For a quasi-prospective test set of 343 compounds, we show that RF models devoid of structurally similar compounds in the training set predict human clearance with a geometric mean fold error (GMFE) of 3.3. While the observed GMFE illustrates how difficult it is to generate a useful model that is broadly applicable, we posit that our RF models yield a more realistic assessment of how well human clearance can be predicted prospectively. We deployed the conformal prediction formalism to assess the model applicability and to determine the prediction confidence intervals for each prediction. We observed that clearance can be predicted better for renally cleared compounds than for other clearance mechanisms. We show that applying a classification model for predicting renal clearance identifies a subset of compounds for which clearance can be predicted with higher accuracy, yielding a GMFE of 2.3. In addition, our in silico RF human clearance models compared well to models derived from scaling human hepatocytes or preclinical in vivo data.
科研通智能强力驱动
Strongly Powered by AbleSci AI