化学
鉴定(生物学)
化学空间
计算机科学
模式识别(心理学)
人工智能
药物发现
植物
生物
生物化学
作者
Ting Xie,Hailiang Zhang,Qiong Yang,Jinyu Sun,Yue Wang,Long Jia,Zhimin Zhang,Hongmei Lü
标识
DOI:10.1021/acs.analchem.5c01594
摘要
Tandem mass spectrometry (MS/MS) is a cornerstone for compound identification in complex mixtures, but conventional spectral matching approaches face critical limitations due to limited library coverage and matching algorithms. To address this, we propose CSU-MS2 (contrastively spectral-structural Unification framework for MS/MS Spectra and Molecular Structures), a novel framework that bridges MS/MS spectra and molecular structures through cross-modal contrastive learning. CSU-MS2 uniquely integrates an External Space Attention Aggregation (ESA) module to dynamically align spectral and structural features, enabling direct retrieval of molecular candidates from a unified embedding space. The framework is pretrained on large-scale in-silico MS/MS data sets generated by CFM-ID and ICEBERG, followed by fine-tuning on high-quality experimental data. Results show that CSU-MS2 achieves a Recall@1 of 75.45% when matching 1047 spectra against a reference library containing 1,001,047 compounds, significantly surpassing existing methods such as CFM-ID (68.38%), SIRIUS (64.85%), MetFrag (48.59%), and CMSSP (30.47%). Furthermore, rigorous validation on three external data sets spanning human metabolomics (MTBLS265), plant metabolites (PMhub), and the CASMI 2022 challenge demonstrates robust generalizability, with domain-specific retrieval achieving a Recall@10 of 91.67% for blood metabolites. To facilitate compound identification across various domains, we have assembled a Spectrum-searchable Structural Feature Database (SSFDB) from 23 structural databases and deployed an open-source web server supporting customizable cross-modal retrieval. All code, models, and SSFDB are publicly accessible, offering a transformative solution for high-throughput compound identification in metabolomics and beyond.
科研通智能强力驱动
Strongly Powered by AbleSci AI