拉伤
计算机科学
自然语言处理
人工智能
生物
解剖
作者
Zhitao Mao,Jun Du,Ruoyu Wang,Haoran Li,Jirun Guan,Zhenkun Shi,Xiaoping Liao,Hongwu Ma
标识
DOI:10.1101/2025.03.23.644789
摘要
Abstract Synthetic biology seeks to engineer microbial cell factories for sustainable bioproduction, yet the optimization of these systems is impeded by the complexity of metabolic engineering and the protracted timelines of iterative design-build-test-learn (DBTL) cycles. Traditional computational approaches, such as constraint-based modeling, provide valuable insights but demand extensive manual curation. Large language models (LLMs) hold promises for automating knowledge extraction and strain design, yet conventional models like GPT-4 suffer from outdated corpora and hallucination errors in domain-specific tasks. SynBioGPT v1.0, a Retrieval-Augmented Generation (RAG)-enhanced LLM, enhanced knowledge retrieval using vector search but often retrieved semantically similar yet contextually irrelevant documents. Here, we introduce SynBioGPT v2.0 ( https://synbiogpt.biodesign.ac.cn ), which mitigates these limitations by decomposing queries into sub-questions and employing keyword-based searches. Tested on a 100-question synthetic biology benchmark, SynBioGPT v2.0 achieved 98% accuracy with the Claude-3.7-sonnet backend, a 10% improvement over v1.0’s 88% with Llama3-8B-Instruct. This advance highlight the efficacy of query decomposition and precise retrieval in enhancing LLM utility for synthetic biology.
科研通智能强力驱动
Strongly Powered by AbleSci AI