Development of a liver disease–specific large language model chat interface using retrieval-augmented generation

肝病学计算机科学集合（抽象数据类型）情报检索医学内科学程序设计语言

作者

Jin Ge,Steve Sun,Joseph F. Owens,Victor Galvez,Oksana Gologorskaya,Jennifer C. Lai,Mark J. Pletcher,Ki Lai

出处

期刊：Hepatology [Lippincott Williams & Wilkins]
日期：2024-03-07 卷期号：80 (5): 1158-1168 被引量：98

链接

nih.gov nih.govdoi.org

标识

DOI：10.1097/hep.0000000000000834

摘要

Background and Aims: Large language models (LLMs) have significant capabilities in clinical information processing tasks. Commercially available LLMs, however, are not optimized for clinical uses and are prone to generating hallucinatory information. Retrieval-augmented generation (RAG) is an enterprise architecture that allows the embedding of customized data into LLMs. This approach “specializes” the LLMs and is thought to reduce hallucinations. Approach and Results We developed “LiVersa,” a liver disease–specific LLM, by using our institution’s protected health information-complaint text embedding and LLM platform, “Versa.” We conducted RAG on 30 publicly available American Association for the Study of Liver Diseases guidance documents to be incorporated into LiVersa. We evaluated LiVersa’s performance by conducting 2 rounds of testing. First, we compared LiVersa’s outputs versus those of trainees from a previously published knowledge assessment. LiVersa answered all 10 questions correctly. Second, we asked 15 hepatologists to evaluate the outputs of 10 hepatology topic questions generated by LiVersa, OpenAI’s ChatGPT 4, and Meta’s Large Language Model Meta AI 2. LiVersa’s outputs were more accurate but were rated less comprehensive and safe compared to those of ChatGPT 4. Results: We evaluated LiVersa’s performance by conducting 2 rounds of testing. First, we compared LiVersa’s outputs versus those of trainees from a previously published knowledge assessment. LiVersa answered all 10 questions correctly. Second, we asked 15 hepatologists to evaluate the outputs of 10 hepatology topic questions generated by LiVersa, OpenAI’s ChatGPT 4, and Meta’s Large Language Model Meta AI 2. LiVersa’s outputs were more accurate but were rated less comprehensive and safe compared to those of ChatGPT 4. Conclusions: In this demonstration, we built disease-specific and protected health information-compliant LLMs using RAG. While LiVersa demonstrated higher accuracy in answering questions related to hepatology, there were some deficiencies due to limitations set by the number of documents used for RAG. LiVersa will likely require further refinement before potential live deployment. The LiVersa prototype, however, is a proof of concept for utilizing RAG to customize LLMs for clinical use cases.

求助该文献

最长约 10秒，即可获得该文献文件

Development of a liver disease–specific large language model chat interface using retrieval-augmented generation

今日热心研友