生物降解
随机森林
数量结构-活动关系
化学
氢键
支持向量机
溶解度
人工智能
试验装置
相关系数
适用范围
分子描述符
机器学习
分子
生物系统
计算机科学
有机化学
生物
作者
Hongyan Yin,Cheng Ting Lin,Yujia Tian,Aixia Yan
标识
DOI:10.1021/acs.chemrestox.2c00330
摘要
Persistent contaminants from different industries have already caused significant risks to the environment and public health. In this study, a data set containing 1306 not readily biodegradable (NRB) and 622 readily biodegradable (RB) chemicals was collected and characterized by CORINA descriptors, MACCS fingerprints, and ECFP_4 fingerprints. We utilized decision tree (DT), support vector machine (SVM), random forest (RF), and deep neural network (DNN) to construct 34 classification models that could predict the biodegradability of compounds. The best model (model 5F) built using a Transformer-CNN algorithm had a balanced accuracy of 86.29% and a Matthews correlation coefficient of 0.71 on the test set. By analyzing the top 10 CORINA descriptors used for modeling, the properties containing solubility, π/σ atom charges, rotatable bonds number, lone pair/π/σ atom electronegativities, molecular weight, and number of nitrogen atom based hydrogen bonding acceptors were determined to be critical for biodegradability. The substructure investigations confirmed earlier studies that the presence of aromatic rings and nitrogen or halogen substitutions in a molecule will hinder the biodegradation of the compound, while the ester groups and carboxyl groups promote biodegradability. We also identified the representative fragments affecting biodegradability by analyzing the frequency differences of substructural fragments between the NRB and RB compounds. The results of the study can provide excellent guidance for the discovery and design of compounds with good chemical biodegradability.
科研通智能强力驱动
Strongly Powered by AbleSci AI