回顾性分析
水准点(测量)
集合(抽象数据类型)
计算机科学
人工智能
过程(计算)
化学信息学
班级(哲学)
训练集
数据挖掘
机器学习
数据集
化学
全合成
计算化学
有机化学
大地测量学
程序设计语言
地理
操作系统
作者
Yingchao Yan,Yang Zhao,Huifeng Yao,Jie Feng,Liang Li,Weijie Han,Xiaohe Xu,Chengtao Pu,Chengdong Zang,Lingfeng Chen,Yuanyuan Li,Haichun Liu,Tao Lu,Yadong Chen,Yanmin Zhang
标识
DOI:10.1021/acs.jcim.3c00274
摘要
Retrosynthesis prediction is crucial in organic synthesis and drug discovery, aiding chemists in designing efficient synthetic routes for target molecules. Data-driven deep retrosynthesis prediction has gained importance due to new algorithms and enhanced computing power. Although existing models show certain predictive power on the USPTO-50K benchmark data set, no one considers the effects of byproducts during the prediction process, which may be due to the lack of byproduct information in the benchmark data set. Here, we propose a novel two-stage retrosynthesis reaction prediction framework based on byproducts called RPBP. First, RPBP predicts the byproduct involved in the reaction based on the product molecule. Then, it handles an end-to-end prediction problem based on the prediction of reactants by product and byproduct. Unlike other methods that first identify the potential reaction center and then predict reactant molecules, RPBP considers additional information from byproducts, such as reaction reagents, conditions, and sites. Interestingly, adding byproducts reduces model learning complexity in natural language processing (NLP). Our RPBP model achieves 54.7% and 66.6% top-1 retrosynthesis prediction accuracy when the reaction class is unknown and known, respectively. It outperforms existing methods for known-class reactions, thanks to the rich chemical information in byproducts. The prediction of four kinase drugs from the literature demonstrates the model's practicality and potential to accelerate drug discovery.
科研通智能强力驱动
Strongly Powered by AbleSci AI