支持向量机
人工智能
化学
机器学习
随机森林
分子描述符
极表面积
计算机科学
生物系统
色谱法
模式识别(心理学)
数量结构-活动关系
分子
生物
有机化学
作者
Xinhao Qu,Chen Jiang,Mengyi Shan,Ke Wang,Jing Chen,Qiming Zhao,Youhong Hu,Jia Liu,Luping Qin,Gang Cheng
标识
DOI:10.1021/acs.jcim.4c01732
摘要
Proteolysis-targeting chimeras (PROTACs) are heterobifunctional molecules that target undruggable proteins, enhance selectivity and prevent target accumulation through catalytic activity. The unique structure of PROTACs presents challenges in structural identification and drug design. Liquid chromatography (LC), combined with mass spectrometry (MS), enhances compound annotation by providing essential retention time (RT) data, especially when MS alone is insufficient. However, predicting RT for PROTACs remains challenging. To address this, we compiled the PROTAC-RT data set from literature and evaluated the performance of four machine learning algorithms─extreme gradient boosting (XGBoost), random forest (RF), K-nearest neighbor (KNN) and support vector machines (SVM)─and a deep learning model, fully connected neural network (FCNN), using 24 molecular fingerprints and descriptors. Through screening combinations of molecular fingerprints, descriptors and chromatographic condition descriptors (CCs), we developed an optimized XGBoost model (XGBoost + moe206+Path + Charge + CCs) that achieved an R2 of 0.958 ± 0.027 and an RMSE of 0.934 ± 0.412. After hyperparameter tuning, the model's R2 improved to 0.963 ± 0.023, with an RMSE of 0.896 ± 0.374. The model showed strong predictive accuracy under new chromatographic separation conditions and was validated using six experimentally determined compounds. SHapley Additive exPlanations (SHAP) not only highlights the advantages of XGBoost but also emphasizes the importance of CCs and molecular features, such as bond variability, van der Waals surface area, and atomic charge states. The optimized XGBoost model combines moe206, path, charge descriptors, and CCs, providing a fast and precise method for predicting the RT of PROTACs compounds, thus facilitating their annotation.
科研通智能强力驱动
Strongly Powered by AbleSci AI