计算生物学
基因
计算机科学
生物
数据科学
遗传学
作者
Theodore Wang,Bowen R. Qin,S J Li,Z. Wang,Xuejian Li,Yuanxu Jiang,Chenrui Qin,Qi Ouyang,Chunbo Lou,Long Qian
出处
期刊:Science Advances
[American Association for the Advancement of Science]
日期:2025-04-09
卷期号:11 (15)
标识
DOI:10.1126/sciadv.adt0402
摘要
Mining and expanding high-quality genetic parts for synthetic biology and bioengineering are urgent needs in the research and development of next-generation biotechnology. However, gene mining has relied on sequence homology or ample expert knowledge, which fundamentally limits the establishment of a comprehensive genetic part catalog. In this work, we propose SYMPLEX (synthetic biological part mining platform by large language model–enabled knowledge extraction), a universal gene-mining platform based on large language models. We applied SYMPLEX to mine enzymes responsible for messenger RNA (mRNA) capping, a key process in eukaryotic posttranscriptional modification, and obtained thousands of diverse candidates with traceable evidence from biomedical literature and databases. Of the 46 experimentally tested integral capping enzyme candidates, 14 demonstrated in vivo cross-species capping activity, and 2 displayed superior in vitro activity over the commercial vaccinia capping enzymes currently used in mRNA vaccine production. SYMPLEX provides a distinct paradigm for functional gene mining and offers powerful tools to facilitate knowledge discovery in fundamental research.
科研通智能强力驱动
Strongly Powered by AbleSci AI