可解释性
化学空间
人工智能
碎片(计算)
化学
模块化(生物学)
计算机科学
相关性(法律)
鉴定(生物学)
模块化设计
集合(抽象数据类型)
概化理论
深度学习
可扩展性
代表(政治)
药物发现
数据集
机器学习
领域知识
领域(数学分析)
转化式学习
自举(财务)
自然(考古学)
化学信息学
编码
数据科学
桥接(联网)
计算生物学
数据驱动
资源(消歧)
本体论
结构母题
特征学习
生物学数据
作者
Bingjie Zhu,Jie Liao,Huihui Liu,Xiaohui Fan,Yiyu Cheng
标识
DOI:10.1021/acs.analchem.5c03958
摘要
Natural products (NPs) are a treasure trove of drug discovery, yet their structural complexity and extreme data scarcity critically hinder AI-driven exploration. To address this challenge, we present MSformer, a transformer-based architecture that bridges this gap by leveraging molecule fragments to systematically encode NP chemical space. These fragments were generated by a mass spectrometry-inspired fragmentation algorithm, termed meta-structures. Unlike chemical models pretrained on comprehensive molecule databases, MSformer is totally pretrained on very limited NP data set by deconstructing 400,000 NPs into 234 million meta-structures. This design enables MSformer to capture the structural richness and drug-like relevance of NPs. Evaluated on 14 tasks across MoleculeNet and the Therapeutics Data Commons data sets, MSformer outperforms state-of-the-art models, demonstrating superior generalizability in property prediction. The abundant meta-structures enable MSformer hierarchical interpretability that reveals task-specific structural determinants and successfully deconstructing approved drugs into bioactive fragments. By integrating domain knowledge with deep learning, MSformer establishes a transformative paradigm for NP-based drug discovery, offering a scalable framework to navigate nature's underexplored chemical repertoire and accelerate the identification of bioactive candidates.
科研通智能强力驱动
Strongly Powered by AbleSci AI