基因组
计算生物学
生物
微生物群
模块化设计
基因
进化生物学
基因组学
门
生态学
人体微生物群
生物分类
代谢途径
系统发育学
次生代谢物
DNA测序
遗传学
生物多样性
作者
Tingjun Xu,Yuwei Yang,Ruixin Zhu,Weili Lin,Jixuan Li,Y J Zheng,Peng Zhang,Guoqing Zhang,Guoping Zhao,Na Jiao
出处
期刊:PubMed
日期:2026-04-30
标识
DOI:10.1038/s43588-026-00983-1
摘要
Microbial-derived secondary metabolites (SMs) hold great therapeutic potential but are predominantly discovered from cultured species, representing only a fraction of microbial biodiversity. Advances in metagenomics have unveiled reservoirs of biosynthetic gene clusters (BGCs), but translating genomic sequences into precise chemical structures remains challenging owing to the structural complexity of cryptic BGCs and the context-dependent substrate tolerance and cross-reactivity of modular biosynthetic domains. Here we present DeepSeMS, a transformer-based large language model that accurately predicts secondary metabolite chemical structures from BGC sequences. By encoding biosynthetic genes as functional domains and leveraging a feature-aligned data augmentation, DeepSeMS outperformed existing methods and successfully generated chemically valid predictions for 96.38% of cryptic BGCs. Applying DeepSeMS to a global ocean metagenome, we characterized over 60,000 secondary metabolites, revealing chemical diversity, ecological specificity and considerable biomedical potential, especially as antibiotics. This study underscores the capability of deep learning-driven approaches in revealing hidden biosynthetic potential of Earth's largest, yet largely unexplored, microbial ecosystem.
科研通智能强力驱动
Strongly Powered by AbleSci AI