蓝图
拟南芥
计算机科学
功能(生物学)
基因组学
数据科学
功能基因组学
生物技术
生物
基因
工程类
遗传学
基因组
突变体
机械工程
作者
Ruixiang Zhang,Yu Wang,Weiyang Yang,Jun Wen,Weizhi Liu,S. Zhi,Guangzhou Li,Nan Chai,Jia‐Qi Huang,Yongyao Xie,Xianrong Xie,Letian Chen,Miao Gu,Yao‐Guang Liu,Qinlong Zhu
标识
DOI:10.1002/advs.202503926
摘要
Abstract Research into plant gene function is crucial for developing strategies to increase crop yields. The recent introduction of large language models (LLMs) offers a means to aggregate large amounts of data into a queryable format, but the output can contain inaccurate or false claims known as hallucinations. To minimize such hallucinations and produce high‐quality knowledge‐based outputs, the abstracts of over 60 000 plant research articles are compiled into a Chroma database for retrieval‐augmented generation (RAG). Then linguistic data are used from 13 993 Arabidopsis ( Arabidopsis thaliana ) phenotypes and 23 323 gene functions to fine‐tune the LLM Llama3‐8B, producing PlantGPT, a virtual expert in Arabidopsis phenotype–gene research. By evaluating answers to test questions, it is demonstrated that PlantGPT outperforms general LLMs in answering specialized questions. The findings provide a blueprint for functional genomics research in food crops and demonstrate the potential for developing LLMs for plant research modalities. To provide broader access and facilitate adoption, the online tool http://www.plantgpt.icu is developed, which will allow researchers to use PlantGPT in their scientific investigations.
科研通智能强力驱动
Strongly Powered by AbleSci AI