赫尔格
可解释性
数量结构-活动关系
广告
计算机科学
交叉验证
药物发现
适用范围
效力
数据挖掘
人工智能
机器学习
化学
药品
药理学
医学
钾通道
内分泌学
体外
生物化学
作者
Kiril Lanevskij,Remigijus Didžiapetris,Andrius Sazonovas
标识
DOI:10.1007/s10822-022-00483-0
摘要
In an earlier study (Didziapetris R & Lanevskij K (2016). J Comput Aided Mol Des. 30:1175-1188) we collected a database of publicly available hERG inhibition data for almost 6700 drug-like molecules and built a probabilistic Gradient Boosting classifier with a minimal set of physicochemical descriptors (log P, pKa, molecular size and topology parameters). This approach favored interpretability over statistical performance but still achieved an overall classification accuracy of 75%. In the current follow-up work we expanded the database (provided in Supplementary Information) to almost 9400 molecules and performed temporal validation of the model on a set of novel chemicals from recently published lead optimization projects. Validation results showed almost no performance degradation compared to the original study. Additionally, we rebuilt the model using AFT (Accelerated Failure Time) learning objective in XGBoost, which accepts both quantitative and censored data often reported in protein inhibition studies. The new model achieved a similar level of accuracy of discerning hERG blockers from non-blockers at 10 µM threshold, which can be conceived as close to the performance ceiling for methods aiming to describe only non-specific ligand interactions with hERG. Yet, this model outputs quantitative potency values (IC50) and is not tied to a particular classification cut-off. pIC50 from patch-clamp measurements can be predicted with R2 ≈ 0.4 and MAE < 0.5, which enables ligand ranking according to their expected potency levels. The employed approach can be valuable for quantitative modeling of various ADME and drug safety endpoints with a high prevalence of censored data.
科研通智能强力驱动
Strongly Powered by AbleSci AI