化学
工作流程
质谱法
环境化学
碎片(计算)
质谱
随机森林
数据库
色谱法
人工智能
计算机科学
操作系统
作者
Zixuan Zhang,Xin Xu,Shipei Xing,Changzhi Shi,Zecang You,Xiaojun Deng,Ling Tan,Zhe Mo,Mingliang Fang
标识
DOI:10.1021/acs.analchem.4c04249
摘要
Polycyclic aromatic hydrocarbons (PAHs) are pervasive environmental pollutants with significant health risks due to their carcinogenic, mutagenic, and teratogenic properties. Traditional methods for PAH identification, primarily relying on gas chromatography–mass spectrometry (GC–MS), utilize spectral library searches together with other techniques, such as mass defect analysis. However, these methods are limited by incomplete spectral libraries and a high false positive rate. Here, we present PAH-Finder, a data-driven workflow that integrates machine learning with high-resolution mass spectrometry (HRMS). PAH-Finder introduces a novel approach to evaluate the fragment distribution of PAH backbones in MS spectra by normalizing fragment m/z values to a 0–100% range relative to the molecular ion peak. Seven machine learning features capture PAH fragmentation characteristics, and a random forest model trained on 98 PAH spectra and 1003 background spectra achieved an F1 score of ∼0.9 in 5-fold cross validation. Additionally, PAH-Finder leverages the presence of doubly charged fragments and molecular formula prediction to enhance the identification accuracy. In a case study, PAH-Finder identified 135 PAHs, including 7 types of previously unreported PAH formulas in particulate matter samples, demonstrating a 246% increase in annotation efficiency compared to the NIST20 library search. It also identified 32 heteroatom-doped PAHs not included in the training data set, showcasing its robustness of generalization. PAH-Finder's high accuracy in detecting a broad spectrum of PAHs facilitates efficient data processing and interpretation for nontargeted analysis, enhancing our understanding of air pollution and public health protection. PAH-Finder is freely available at Github (https://github.com/FangLabNTU/PAH-Finder).
科研通智能强力驱动
Strongly Powered by AbleSci AI