Comparative analysis of large language models in medical counseling: A focus on Helicobacter pylori infection

幽门螺杆菌感染 幽门螺杆菌 利克特量表 完备性(序理论) 拉什模型 医学 英语 心理学 统计 内科学 数学 数学教育 数学分析
作者
Qingzhou Kong,Kunping Ju,Meng Wan,Jing Liu,Xiaoqi Wu,Yueyue Li,Xiuli Zuo,Yanqing Li
出处
期刊:Helicobacter [Wiley]
卷期号:29 (1) 被引量:7
标识
DOI:10.1111/hel.13055
摘要

Abstract Background Large language models (LLMs) are promising medical counseling tools, but the reliability of responses remains unclear. We aimed to assess the feasibility of three popular LLMs as counseling tools for Helicobacter pylori infection in different counseling languages. Materials and Methods This study was conducted between November 20 and December 1, 2023. Three large language models (ChatGPT 4.0 [LLM1], ChatGPT 3.5 [LLM2], and ERNIE Bot 4.0 [LLM3]) were input 15 H. pylori related questions each, once in English and once in Chinese. Each chat was conducted using the “New Chat” function to avoid bias from correlation interference. Responses were recorded and blindly assigned to three reviewers for scoring on three established Likert scales: accuracy (ranged 1–6 point), completeness (ranged 1–3 point), and comprehensibility (ranged 1–3 point). The acceptable thresholds for the scales were set at a minimum of 4, 2, and 2, respectively. Final various source and interlanguage comparisons were made. Results The overall mean (SD) accuracy score was 4.80 (1.02), while 1.82 (0.78) for completeness score and 2.90 (0.36) for comprehensibility score. The acceptable proportions for the accuracy, completeness, and comprehensibility of the responses were 90%, 45.6%, and 100%, respectively. The acceptable proportion of overall completeness score for English responses was better than for Chinese responses ( p = 0.034). For accuracy, the English responses of LLM3 were better than the Chinese responses ( p = 0.0055). As for completeness, the English responses of LLM1 was better than the Chinese responses ( p = 0.0257). For comprehensibility, the English responses of LLM1 was better than the Chinese responses ( p = 0.0496). No differences were found between the various LLMs. Conclusions The LLMs responded satisfactorily to questions related to H. pylori infection. But further improving completeness and reliability, along with considering language nuances, is crucial for optimizing overall performance.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
11完成签到,获得积分10
刚刚
汐界完成签到,获得积分10
刚刚
1秒前
非言墨语完成签到,获得积分10
1秒前
aimorui完成签到,获得积分10
1秒前
淡然尔丝完成签到,获得积分10
1秒前
大模型应助zhaoyu采纳,获得10
2秒前
Zelytnn.Lo完成签到,获得积分10
2秒前
Riggle G完成签到,获得积分0
2秒前
zzk完成签到,获得积分10
2秒前
阿甘完成签到,获得积分10
2秒前
weadu完成签到,获得积分10
2秒前
Master_Ye完成签到,获得积分10
3秒前
阿蕊完成签到,获得积分10
3秒前
吊袜带完成签到,获得积分10
3秒前
4秒前
wenbin发布了新的文献求助10
4秒前
研友_LN7x6n完成签到,获得积分0
4秒前
传奇3应助下课了吧采纳,获得10
4秒前
落后的听双完成签到,获得积分10
4秒前
5秒前
笑羽完成签到,获得积分0
5秒前
Ava应助一吃就饱采纳,获得50
6秒前
6秒前
大模型应助清脆迎曼采纳,获得10
6秒前
6秒前
SciGPT应助一颗西柚采纳,获得10
6秒前
Murmansk完成签到,获得积分10
7秒前
小李爱学习完成签到 ,获得积分10
7秒前
曹飒丽完成签到,获得积分10
8秒前
初景发布了新的文献求助30
8秒前
潘潘完成签到 ,获得积分10
9秒前
斯文败类应助半拉馒头采纳,获得10
9秒前
萤火淡淡完成签到 ,获得积分20
9秒前
小惠惠发布了新的文献求助10
9秒前
10秒前
Sirius星月完成签到,获得积分10
10秒前
任性的诗兰完成签到,获得积分10
10秒前
dde应助ok123采纳,获得10
10秒前
laber应助想睡在雨里采纳,获得50
11秒前
高分求助中
Adhesion Science: Principles & Practice 1234
Cold War Transcended: Australia's China Policy, 1949-1990 998
Signals, Systems, and Signal Processing 610
Fundamentals of Pharmaceutical and Biologics Regulations: A Global Perspective, Second Edition 600
Testimonial Injustice and Trust 510
Burger's Medicinal Chemistry and Drug Discovery 400
Fundamentals of Body MRI 3rd Edition 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6639940
求助须知:如何正确求助?哪些是违规求助? 8397423
关于积分的说明 17955714
捐赠科研通 5827317
什么是DOI,文献DOI怎么找? 2967846
邀请新用户注册赠送积分活动 1942638
关于科研通互助平台的介绍 1858575