Comparison of Scaling Methods to Obtain Calibrated Probabilities of Activity for Protein–Ligand Predictions

布里氏评分校准背景（考古学）估计员统计计算机科学随机森林贝叶斯定理概率分布人工智能算法机器学习数学贝叶斯概率生物古生物学

作者

Lewis Mervin,Avid M. Afzal,Ola Engkvist,Andreas Bender

出处

期刊：Journal of Chemical Information and Modeling [American Chemical Society]
日期：2020-08-31 卷期号：60 (10): 4546-4559 被引量：11

标识

DOI：10.1021/acs.jcim.0c00476

摘要

In the context of bioactivity prediction, the question of how to calibrate a score produced by a machine learning method into a probability of binding to a protein target is not yet satisfactorily addressed. In this study, we compared the performance of three such methods, namely, Platt scaling (PS), isotonic regression (IR), and Venn–ABERS predictors (VA), in calibrating prediction scores obtained from ligand–target prediction comprising the Naïve Bayes, support vector machines, and random forest (RF) algorithms. Calibration quality was assessed on bioactivity data available at AstraZeneca for 40 million data points (compound–target pairs) across 2112 targets and performance was assessed using stratified shuffle split (SSS) and leave 20% of scaffolds out (L20SO) validation. VA achieved the best calibration performances across all machine learning algorithms and cross validation methods tested and also the lowest (best) Brier score loss (mean squared difference between the outputted probability estimates assigned to a compound and the actual outcome). In comparison, the PS and IR methods can actually degrade the assigned probability estimates, particularly for the RF for SSS and during L20SO. Sphere exclusion, a method to sample additional (putative) inactive compounds, was shown to inflate the overall Brier score loss performance, through the artificial requirement for inactive molecules to be dissimilar to active compounds, but was shown to result in overconfident estimators. VA was able to successfully calibrate the probability estimates for even small calibration sets. The multiprobability values (lower and upper probability boundary intervals) were shown to produce large discordance for test set molecules that are neither very similar nor very dissimilar to the active training set, which were hence difficult to predict, suggesting that multiprobability discordance can be used as an estimate for target prediction uncertainty. Overall, we were able to show in this work that VA scaling of target prediction models is able to improve probability estimates in all testing instances and is currently being applied for in-house approaches.

求助该文献

最长约 10秒，即可获得该文献文件

Comparison of Scaling Methods to Obtain Calibrated Probabilities of Activity for Protein–Ligand Predictions

今日热心研友