微生物群
计算机科学
一般化
限制
人工智能
机器学习
人类微生物组计划
生态学
学习迁移
水准点(测量)
微生物生态学
基因组
数据科学
可扩展性
基础(证据)
计算生物学
人体微生物群
生物
微生物种群生物学
时间尺度
人机交互
作者
Haohong Zhang,Yuli Zhang,Xiaoli Ma,Jiayun Xiong,Ronghua Yang,Kang Ning
标识
DOI:10.1002/advs.202513333
摘要
ABSTRACT Microbial communities are integral to human health, biotechnology, and environmental systems, yet their analysis is hindered by data heterogeneity and batch effects across studies. Traditional supervised methods often fail to capture universal patterns, limiting their utility in diverse contexts. Here, we present the Microbial General Model (MGM), the first large‐scale foundation model for microbiome analysis, pretrained on 260,000 samples using transformer‐based language modeling. MGM employs self‐attention mechanisms and autoregressive pre‐training to learn contextualized representations of microbial compositions, enabling robust transfer learning for downstream tasks. Benchmark evaluations demonstrate MGM's superior performance over conventional methods (average ROC‐AUC = 0.99 vs. 0.68–0.97) in microbial community classification, with enhanced generalization across geographic regions. MGM also captures spatial and temporal microbial dynamics, as evidenced by its application to a longitudinal infant cohort, where it delineated delivery mode‐specific microbiome trajectories and identified keystone genera such as Bacteroides and Bifidobacterium in vaginal deliveries and Haemophilus in cesarean deliveries. Furthermore, through prompt‐guided generation, MGM produced realistic microbial profiles conditioned on disease labels. By integrating self‐supervised learning with domain‐specific fine‐tuning, MGM advances the scalability and precision of microbiome analyses, offering a unified framework for diagnostics, ecological studies, and therapeutic discovery.
科研通智能强力驱动
Strongly Powered by AbleSci AI