堆积
化学空间
分子描述符
代表(政治)
优先次序
集合(抽象数据类型)
化学
计算机科学
化学信息学
毒性
基础(拓扑)
芳香性
生化工程
组合化学
生物系统
数量结构-活动关系
有机化学品
计算化学
环境化学
微粒
数据集
细胞毒性
环境科学
分子
评价方法
作者
Ye Lü,Yangyang Wu,X. P. Li
标识
DOI:10.1021/acs.est.5c17667
摘要
Although the health impacts of fine particulate matter (PM2.5) are primarily attributed to its chemically diverse composition rather than to mass concentration, assessing the toxicity of PM2.5 constituent compounds remains highly challenging due to chemical complexity and limited experimental scalability. This study introduces an interpretable machine learning (ML) framework using a curated A549 cytotoxicity data set (19,841 compounds) that integrates customized Molecular Access System (MACCS) fingerprints, six base models, and a stacking ensemble meta-model, all optimized with a biobjective strategy to assess the toxicity of organic compounds in PM2.5. The stacking ensemble model demonstrated satisfactory performance (test AUC > 0.8), exhibiting good generalization, adaptability, and robustness. Nontargeted analysis generated and experimentally validated a prediction set from the Hong Kong PM2.5 samples (51 of 387 compounds confirmed from 13 classes), demonstrating broad applicability on an independent Nanjing PM2.5 set (572 compounds). Key substructures driving toxicity, identified through a SHapley Additive exPlanations (SHAP) analysis and cell experimental validation, revealed that PM2.5 compounds with aromatic rings and nitrogen-based functional groups (e.g., aromatic α,β-unsaturated ketones, aromatic amines, aromatic nitro compounds, and tertiary amines) likely contribute to high toxicity. The derived "structure-toxicity" rules narrow the search space from thousands of compounds to those containing critical substructures, enabling efficient prioritization of toxic components and providing a foundation for improving model specificity and predictive accuracy in future studies.
科研通智能强力驱动
Strongly Powered by AbleSci AI