Molecular Representation and Closed-Loop Validation for Toxicity Assessment of Organic Compounds in Ambient Air PM 2.5

堆积化学空间分子描述符代表（政治）优先次序集合（抽象数据类型）化学计算机科学化学信息学毒性基础（拓扑）芳香性生化工程组合化学生物系统数量结构-活动关系有机化学品计算化学环境化学微粒数据集细胞毒性环境科学分子评价方法

作者

Ye Lü,Yangyang Wu,X. P. Li

出处

期刊：Environmental Science & Technology [American Chemical Society]
日期：2026-02-25

链接

nih.govdoi.org

标识

DOI：10.1021/acs.est.5c17667

摘要

Although the health impacts of fine particulate matter (PM_2.5) are primarily attributed to its chemically diverse composition rather than to mass concentration, assessing the toxicity of PM_2.5 constituent compounds remains highly challenging due to chemical complexity and limited experimental scalability. This study introduces an interpretable machine learning (ML) framework using a curated A549 cytotoxicity data set (19,841 compounds) that integrates customized Molecular Access System (MACCS) fingerprints, six base models, and a stacking ensemble meta-model, all optimized with a biobjective strategy to assess the toxicity of organic compounds in PM_2.5. The stacking ensemble model demonstrated satisfactory performance (test AUC > 0.8), exhibiting good generalization, adaptability, and robustness. Nontargeted analysis generated and experimentally validated a prediction set from the Hong Kong PM_2.5 samples (51 of 387 compounds confirmed from 13 classes), demonstrating broad applicability on an independent Nanjing PM_2.5 set (572 compounds). Key substructures driving toxicity, identified through a SHapley Additive exPlanations (SHAP) analysis and cell experimental validation, revealed that PM_2.5 compounds with aromatic rings and nitrogen-based functional groups (e.g., aromatic α,β-unsaturated ketones, aromatic amines, aromatic nitro compounds, and tertiary amines) likely contribute to high toxicity. The derived "structure-toxicity" rules narrow the search space from thousands of compounds to those containing critical substructures, enabling efficient prioritization of toxic components and providing a foundation for improving model specificity and predictive accuracy in future studies.

求助该文献

最长约 10秒，即可获得该文献文件

Molecular Representation and Closed-Loop Validation for Toxicity Assessment of Organic Compounds in Ambient Air PM 2.5

今日热心研友