Uncertainty Quantification in Molecular Machine Learning for Property Predictions under Data Shifts

不确定度量化财产（哲学）稳健性（进化）标杆管理计算机科学人工智能水准点（测量）机器学习化学空间贝叶斯概率数据集集合（抽象数据类型）大数据数据挖掘适用范围测量不确定度可靠性（半导体）贝叶斯推理合成数据不确定度分析均方预测误差贝叶斯定理实验数据预测建模计算模型药物发现财产价值训练集敏感性分析

作者

Raquel Parrondo-Pizarro,Jessica Lanini,Raquel Rodríguez-Pérez

出处

期刊：Journal of Chemical Information and Modeling [American Chemical Society]
日期：2026-01-14 卷期号：66 (2): 923-935

链接

nih.gov nih.govdoi.org

标识

DOI：10.1021/acs.jcim.5c02381

摘要

Drug discovery and medicinal chemistry efforts are increasingly influenced by machine learning (ML), with compound property prediction as a central application. ML models have demonstrated strong performance in predicting various compound properties from chemical structure. However, these models can exhibit varying levels of prediction error, making uncertainty quantification (UQ) essential for informed decisions. Standard UQ metrics include the distance to the molecules in the training set and prediction variance, obtained through methods such as model ensembles or Bayesian modeling. Although several UQ methodologies have been developed in recent years, no single approach consistently outperformed others. Herein, we present a comprehensive benchmark of UQ strategies for ML-based prediction of absorption, distribution, metabolism, and excretion (ADME) properties, using both in-house and public data sets. We employed the recently introduced UNIQUE (UNcertaInty QUantification bEnchmarking) framework and evaluated UQ method performance under data shifts. Our findings indicate data-based UQ metrics (e.g., chemical distance), and model-based UQ metrics (e.g., predicted value and variance) may capture complementary aspects of uncertainty. Their combination through error models, designed to predict the original ML model's error, yielded higher-quality uncertainty estimates. These error models emerged as a promising strategy for enhancing UQ, showing robustness in under various degrees and types of data shift. Taken together, our work highlights the potential of combining diverse UQ metrics and error modeling to improve reliability in molecular property prediction. By establishing standardized evaluation setups and assessing UQ under data shifts, we provide a foundation for future UQ method development and benchmarking in the field.

求助该文献

最长约 10秒，即可获得该文献文件

Uncertainty Quantification in Molecular Machine Learning for Property Predictions under Data Shifts

今日热心研友