ChloroDBPFinder: Machine Learning-Guided Recognition of Chlorinated Disinfection Byproducts from Nontargeted LC-HRMS Analysis

化学 公共化学 人工智能 机器学习 支持向量机 随机森林 模式识别(心理学) 色谱法 计算机科学 有机化学
作者
Tingting Zhao,Nicholas J. P. Wawryk,Shipei Xing,Brian J. Low,Gigi Li,Huaxu Yu,Yukai Wang,Qiming Shen,Xing‐Fang Li,Tao Huan
出处
期刊:Analytical Chemistry [American Chemical Society]
卷期号:96 (6): 2590-2598 被引量:3
标识
DOI:10.1021/acs.analchem.3c05124
摘要

High-resolution mass spectrometry (HRMS) is a prominent analytical tool that characterizes chlorinated disinfection byproducts (Cl-DBPs) in an unbiased manner. Due to the diversity of chemicals, complex background signals, and the inherent analytical fluctuations of HRMS, conventional isotopic pattern (37Cl/35Cl), mass defect, and direct molecular formula (MF) prediction are insufficient for accurate recognition of the diverse Cl-DBPs in real environmental samples. This work proposes a novel strategy to recognize Cl-containing chemicals based on machine learning. Our hierarchical machine learning framework has two random forest-based models: the first layer is a binary classifier to recognize Cl-containing chemicals, and the second layer is a multiclass classifier to annotate the number of Cl present. This model was trained using ∼1.4 million distinctive MFs from PubChem. Evaluated on over 14,000 unique MFs from NIST20, this machine learning model achieved 93.3% accuracy in recognizing Cl-containing MFs (Cl-MFs) and 92.9% accuracy in annotating the number of Cl for Cl-MFs. Furthermore, the trained model was integrated into ChloroDBPFinder, a standalone R package for the streamlined processing of LC-HRMS data and annotating both known and unknown Cl-containing compounds. Tested on existing Cl-DBP data sets related to aspartame chlorination in tap water, our ChloroDBPFinder efficiently extracted 159 Cl-containing DBP features and tentatively annotated the structures of 10 Cl-DBPs via molecular networking. In another application of a chlorinated humic substance, ChloroDBPFinder extracted 79 high-quality Cl-DBPs and tentatively annotated six compounds. In summary, our proposed machine learning strategy and the developed ChloroDBPFinder provide an advanced solution to identifying Cl-containing compounds in nontargeted analysis of water samples. It is freely available on GitHub (https://github.com/HuanLab/ChloroDBPFinder).
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
北冥有鱼完成签到,获得积分10
1秒前
2秒前
大模型应助学术渣渣采纳,获得10
2秒前
田様应助这篇文献我不会采纳,获得10
3秒前
李白的白123完成签到,获得积分10
4秒前
寻舟者完成签到,获得积分10
5秒前
研友_VZG7GZ应助潇潇采纳,获得10
5秒前
叶子发布了新的文献求助10
6秒前
科研小狗完成签到 ,获得积分10
6秒前
打打应助月亮采纳,获得10
7秒前
45度人完成签到,获得积分10
7秒前
浮熙完成签到 ,获得积分10
10秒前
11秒前
Polly完成签到,获得积分10
12秒前
万能图书馆应助liu1109采纳,获得10
14秒前
14秒前
上官若男应助Dr.向采纳,获得10
15秒前
小蘑菇应助科研通管家采纳,获得10
16秒前
JamesPei应助科研通管家采纳,获得10
16秒前
科研通AI5应助科研通管家采纳,获得10
16秒前
CodeCraft应助科研通管家采纳,获得10
16秒前
16秒前
HEIKU应助科研通管家采纳,获得10
16秒前
Orange应助小小小何77采纳,获得10
16秒前
顾矜应助科研通管家采纳,获得10
16秒前
科研通AI5应助科研通管家采纳,获得10
16秒前
HEIKU应助科研通管家采纳,获得10
16秒前
JamesPei应助科研通管家采纳,获得10
16秒前
传奇3应助科研通管家采纳,获得10
16秒前
16秒前
无限白易应助科研通管家采纳,获得10
16秒前
科研助手6应助科研通管家采纳,获得10
17秒前
Owen应助科研通管家采纳,获得10
17秒前
李健应助科研通管家采纳,获得10
17秒前
小巧的如冬完成签到,获得积分10
17秒前
菠萝炒饭应助科研通管家采纳,获得10
17秒前
英俊的铭应助科研通管家采纳,获得10
17秒前
HEIKU应助科研通管家采纳,获得10
17秒前
17秒前
高分求助中
Applied Survey Data Analysis (第三版, 2025) 800
Narcissistic Personality Disorder 700
Assessing and Diagnosing Young Children with Neurodevelopmental Disorders (2nd Edition) 700
Handbook of Experimental Social Psychology 500
The Martian climate revisited: atmosphere and environment of a desert planet 500
Transnational East Asian Studies 400
Towards a spatial history of contemporary art in China 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3845801
求助须知:如何正确求助?哪些是违规求助? 3388159
关于积分的说明 10551960
捐赠科研通 3108790
什么是DOI,文献DOI怎么找? 1713127
邀请新用户注册赠送积分活动 824592
科研通“疑难数据库(出版商)”最低求助积分说明 774908