Comparative analysis of large language models in medical counseling: A focus on Helicobacter pylori infection

幽门螺杆菌感染 幽门螺杆菌 利克特量表 完备性(序理论) 拉什模型 医学 英语 心理学 统计 内科学 数学 数学教育 数学分析
作者
Qingzhou Kong,Kunping Ju,Meng Wan,Jing Liu,Xiaoqi Wu,Yueyue Li,Xiuli Zuo,Yanqing Li
出处
期刊:Helicobacter [Wiley]
卷期号:29 (1) 被引量:7
标识
DOI:10.1111/hel.13055
摘要

Abstract Background Large language models (LLMs) are promising medical counseling tools, but the reliability of responses remains unclear. We aimed to assess the feasibility of three popular LLMs as counseling tools for Helicobacter pylori infection in different counseling languages. Materials and Methods This study was conducted between November 20 and December 1, 2023. Three large language models (ChatGPT 4.0 [LLM1], ChatGPT 3.5 [LLM2], and ERNIE Bot 4.0 [LLM3]) were input 15 H. pylori related questions each, once in English and once in Chinese. Each chat was conducted using the “New Chat” function to avoid bias from correlation interference. Responses were recorded and blindly assigned to three reviewers for scoring on three established Likert scales: accuracy (ranged 1–6 point), completeness (ranged 1–3 point), and comprehensibility (ranged 1–3 point). The acceptable thresholds for the scales were set at a minimum of 4, 2, and 2, respectively. Final various source and interlanguage comparisons were made. Results The overall mean (SD) accuracy score was 4.80 (1.02), while 1.82 (0.78) for completeness score and 2.90 (0.36) for comprehensibility score. The acceptable proportions for the accuracy, completeness, and comprehensibility of the responses were 90%, 45.6%, and 100%, respectively. The acceptable proportion of overall completeness score for English responses was better than for Chinese responses ( p = 0.034). For accuracy, the English responses of LLM3 were better than the Chinese responses ( p = 0.0055). As for completeness, the English responses of LLM1 was better than the Chinese responses ( p = 0.0257). For comprehensibility, the English responses of LLM1 was better than the Chinese responses ( p = 0.0496). No differences were found between the various LLMs. Conclusions The LLMs responded satisfactorily to questions related to H. pylori infection. But further improving completeness and reliability, along with considering language nuances, is crucial for optimizing overall performance.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
1秒前
jjding完成签到,获得积分10
1秒前
李小野完成签到 ,获得积分10
2秒前
谨慎的花生完成签到,获得积分10
3秒前
3秒前
YZ完成签到,获得积分10
3秒前
阿阿松松松松松完成签到,获得积分10
4秒前
曲小晴完成签到,获得积分10
4秒前
高飞完成签到 ,获得积分10
4秒前
4秒前
含蓄发布了新的文献求助10
4秒前
5秒前
青桔柠檬完成签到 ,获得积分10
5秒前
烈阳完成签到,获得积分10
5秒前
5秒前
充电宝应助wld_gs采纳,获得10
5秒前
6秒前
6秒前
十六月亮发布了新的文献求助10
6秒前
橘生淮北1021完成签到,获得积分10
6秒前
稳重惜灵完成签到,获得积分10
7秒前
GinFF发布了新的文献求助10
7秒前
热情礼貌一问三不知完成签到 ,获得积分10
7秒前
manix发布了新的文献求助10
8秒前
xxxxxu完成签到 ,获得积分10
8秒前
哈牛柚子鹿完成签到,获得积分10
8秒前
9秒前
Ferry完成签到,获得积分10
9秒前
慕青应助百十余采纳,获得10
9秒前
追寻的忆南完成签到,获得积分10
9秒前
米花完成签到,获得积分10
9秒前
兴奋小丸子完成签到,获得积分10
10秒前
呆萌白枫发布了新的文献求助30
10秒前
玉玉完成签到,获得积分10
10秒前
zy发布了新的文献求助10
10秒前
lemon完成签到 ,获得积分10
10秒前
橙子完成签到 ,获得积分10
10秒前
稳重惜灵发布了新的文献求助10
11秒前
笑观天下完成签到,获得积分10
11秒前
温暖的乞完成签到,获得积分10
11秒前
高分求助中
Adhesion Science: Principles & Practice 1234
Cold War Transcended: Australia's China Policy, 1949-1990 998
Signals, Systems, and Signal Processing 610
Fundamentals of Pharmaceutical and Biologics Regulations: A Global Perspective, Second Edition 600
Testimonial Injustice and Trust 510
Burger's Medicinal Chemistry and Drug Discovery 400
Fundamentals of Body MRI 3rd Edition 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6639656
求助须知:如何正确求助?哪些是违规求助? 8397217
关于积分的说明 17954960
捐赠科研通 5826826
什么是DOI,文献DOI怎么找? 2967678
邀请新用户注册赠送积分活动 1942540
关于科研通互助平台的介绍 1858293