生物
基因组
计算生物学
基因组
门
基因
进化生物学
遗传学
作者
Qilong Lai,Shuai Yao,Yitian Zha,Haohong Zhang,Haobo Zhang,Ying Ye,Qingyi Tong,Hong Bai,Kang Ning
摘要
Abstract Biosynthetic gene clusters (BGCs), key in synthesizing microbial secondary metabolites, are mostly hidden in microbial genomes and metagenomes. To unearth this vast potential, we present BGC-Prophet, a transformer-based language model for BGC prediction and classification. Leveraging the transformer encoder, BGC-Prophet captures location-dependent relationships between genes. As one of the pioneering ultrahigh-throughput tools, BGC-Prophet significantly surpasses existing methods in efficiency and fidelity, enabling comprehensive pan-phylogenetic and whole-metagenome BGC screening. Through the analysis of 85 203 genomes and 9428 metagenomes, BGC-Prophet has profiled an extensive array of sub-million BGCs. It highlights notable enrichment in phyla like Actinomycetota and the widespread distribution of polyketide, NRP, and RiPP BGCs across diverse lineages. It reveals enrichment patterns of BGCs following important geological events, suggesting environmental influences on BGC evolution. BGC-Prophet’s capabilities in detection of BGCs and evolutionary patterns offer contributions to deeper understanding of microbial secondary metabolites and application in synthetic biology.
科研通智能强力驱动
Strongly Powered by AbleSci AI