化学
简编
计算机科学
瓶颈
数据挖掘
集合(抽象数据类型)
情报检索
人工智能
考古
嵌入式系统
历史
程序设计语言
作者
Thomas V. Harwood,Mingxun Wang,Trent R. Northen,Benjamin P. Bowen
标识
DOI:10.1021/acs.analchem.5c02591
摘要
A significant bottleneck in metabolomics data interpretation is the effective use of domain knowledge to assign structural information based on fragmentation patterns. The mass spectrometry query language (MassQL) aims to make this process accessible and applicable across multiple analysis platforms. While advanced computational methods are capable of predicting compound structures from fragmentation data, AI/ML approaches often rely on complex, opaque criteria that are difficult to interpret or modify. As a result, their predictive patterns cannot be readily translated into human-readable rules, such as those used in MassQL. In this study, we introduce ChemEcho, a machine learning embedding method that converts tandem mass spectrometry data into sparse feature vectors containing peak and neutral mass subformulae to enhance explainable AI/ML-based methods. An advantage of this approach is that decision trees trained using these feature vectors can be directly translated to MassQL. Using a battery of decision trees trained using ChemEcho embeddings to predict molecular attributes, we generated over 1500 MassQL queries for 765 molecular features and evaluated their precision and recall. From these queries, the 50 highest-performing queries were integrated into the MassQL compendium. This set of generated MassQL queries included environmentally and biologically relevant classes such as PFAS and molecules containing phosphate or sulfate substructures. To illustrate the impact these queries would have on a typical metabolomics experiment, these MassQL queries were applied to a public metabolomics data set─resulting in a marked increase in the structural information derived from tandem mass spectra. Access and reuse of these queries is expected to enhance structural annotation in untargeted experiments, leading to more specific claims and advancing many applications in metabolomics.
科研通智能强力驱动
Strongly Powered by AbleSci AI