注释
计算机科学
强化学习
排名(信息检索)
生成模型
化学
人工智能
公共化学
生成语法
生物化学
作者
Margaret R. Martin,Soha Hassoun
标识
DOI:10.1021/acs.analchem.5c01770
摘要
Despite the size increase in spectral reference libraries and available annotation tools, the rate of assigning molecular structures to tandem mass spectra remains low. As not all chemical products are known nor cataloged in databases, generative AI models are poised to address this gap through de novo structural candidate generation. We develop a novel method, Optimized Molecular Generation (OMG), for de novo molecular generation for mass spectra annotation. OMG comprises two steps: molecular generation and candidate ranking. During molecular generation, we finetune a prior unbiased molecular generation model using transfer learning on molecules retrieved from PubChem based on a target molecular formula. Using reinforcement learning, we utilize custom scoring functions to create a curriculum-learning scheme that guides the generation of novel molecular candidates for a queried spectrum. After sampling the finetuned model, we rank the generated candidate structures. OMG finetunes REINVENT4's pretrained molecular generator and ranks generated molecules using two recent ranking models, JESTR and ESP. We evaluate OMG on the CANOPUS and MassSpecGym data sets, for which OMG achieves 10.51 and 2.42% for top-1 accuracy, respectively, therefore outperforming current baselines. Our work highlights the promise of utilizing transfer and reinforcement learning in guiding de novo generation for spectra annotation.
科研通智能强力驱动
Strongly Powered by AbleSci AI