亲爱的研友该休息了!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!身体可是革命的本钱,早点休息,好梦!

Currently Available Large Language Models Do Not Provide Musculoskeletal Treatment Recommendations That Are Concordant With Evidence‐Based Clinical Practice Guidelines

临床实习 医学 重症监护医学 医学物理学 计算机科学 物理疗法
作者
Benedict U. Nwachukwu,Nathan H. Varady,Answorth A. Allen,Joshua S. Dines,David W. Altchek,Riley J. Williams,Kyle N. Kunze
出处
期刊:Arthroscopy [Elsevier BV]
卷期号:41 (2): 263-263 被引量:37
标识
DOI:10.1016/j.arthro.2024.07.040
摘要

PURPOSE: To determine whether several leading, commercially available large language models (LLMs) provide treatment recommendations concordant with evidence-based clinical practice guidelines (CPGs) developed by the American Academy of Orthopaedic Surgeons (AAOS). METHODS: All CPGs concerning the management of rotator cuff tears (n = 33) and anterior cruciate ligament injuries (n = 15) were extracted from the AAOS. Treatment recommendations from Chat-Generative Pretrained Transformer version 4 (ChatGPT-4), Gemini, Mistral-7B, and Claude-3 were graded by 2 blinded physicians as being concordant, discordant, or indeterminate (i.e., neutral response without definitive recommendation) with respect to AAOS CPGs. The overall concordance between LLM and AAOS recommendations was quantified, and the comparative overall concordance of recommendations among the 4 LLMs was evaluated through the Fisher exact test. RESULTS: Overall, 135 responses (70.3%) were concordant, 43 (22.4%) were indeterminate, and 14 (7.3%) were discordant. Inter-rater reliability for concordance classification was excellent (κ = 0.92). Concordance with AAOS CPGs was most frequently observed with ChatGPT-4 (n = 38, 79.2%) and least frequently observed with Mistral-7B (n = 28, 58.3%). Indeterminate recommendations were most frequently observed with Mistral-7B (n = 17, 35.4%) and least frequently observed with Claude-3 (n = 8, 6.7%). Discordant recommendations were most frequently observed with Gemini (n = 6, 12.5%) and least frequently observed with ChatGPT-4 (n = 1, 2.1%). Overall, no statistically significant difference in concordant recommendations was observed across LLMs (P = .12). Of all recommendations, only 20 (10.4%) were transparent and provided references with full bibliographic details or links to specific peer-reviewed content to support recommendations. CONCLUSIONS: Among leading commercially available LLMs, more than 1-in-4 recommendations concerning the evaluation and management of rotator cuff and anterior cruciate ligament injuries do not reflect current evidence-based CPGs. Although ChatGPT-4 showed the highest performance, clinically significant rates of recommendations without concordance or supporting evidence were observed. Only 10% of responses by LLMs were transparent, precluding users from fully interpreting the sources from which recommendations were provided. CLINICAL RELEVANCE: Although leading LLMs generally provide recommendations concordant with CPGs, a substantial error rate exists, and the proportion of recommendations that do not align with these CPGs suggests that LLMs are not trustworthy clinical support tools at this time. Each off-the-shelf, closed-source LLM has strengths and weaknesses. Future research should evaluate and compare multiple LLMs to avoid bias associated with narrow evaluation of few models as observed in the current literature.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
Seven完成签到 ,获得积分10
刚刚
完美世界应助Moment采纳,获得10
10秒前
17秒前
机灵的幼菱完成签到,获得积分10
22秒前
Moment发布了新的文献求助10
24秒前
33秒前
ivan发布了新的文献求助10
40秒前
43秒前
魁梧的衫完成签到 ,获得积分10
59秒前
1分钟前
馒头完成签到 ,获得积分10
1分钟前
VirgoYn完成签到,获得积分0
1分钟前
lizishu应助zheng-homes采纳,获得10
1分钟前
可爱的函函应助hulutang采纳,获得30
1分钟前
1分钟前
顺心惜文完成签到 ,获得积分10
1分钟前
1分钟前
1分钟前
Faner完成签到,获得积分20
1分钟前
swimming完成签到 ,获得积分10
1分钟前
Faner发布了新的文献求助30
1分钟前
1分钟前
斯文败类应助怡然平露采纳,获得10
2分钟前
2分钟前
2分钟前
Polymer72完成签到,获得积分0
2分钟前
怡然平露发布了新的文献求助10
2分钟前
2分钟前
芊芊墨发布了新的文献求助10
2分钟前
鸭鸭王子应助kinghead采纳,获得10
2分钟前
2分钟前
鸭鸭王子应助kinghead采纳,获得10
2分钟前
怡然平露完成签到,获得积分10
2分钟前
lixin1924应助kinghead采纳,获得10
2分钟前
2分钟前
afanda发布了新的文献求助30
2分钟前
鸭鸭王子应助kinghead采纳,获得10
2分钟前
2分钟前
lixin1924应助kinghead采纳,获得10
2分钟前
wy.he应助kinghead采纳,获得10
2分钟前
高分求助中
Ideology and Meaning-Making under the Putin Regime 750
Introduction to Industrial/Organizational Psychology 600
Prompt Engineering for Clinicians: Harnessing AI in Everyday Medical Practice 600
Handbook of Luminescence Dating 500
Safety Pharmacology 500
《KNN基无铅压电陶瓷电学性能优化与物理机理研究》 500
Medical Law and Ethics Tenth Edition 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 计算机科学 化学工程 生物化学 物理 内科学 复合材料 催化作用 光电子学 物理化学 电极 细胞生物学 基因 遗传学
热门帖子
关注 科研通微信公众号,转发送积分 6927788
求助须知:如何正确求助?哪些是违规求助? 8616200
关于积分的说明 18277139
捐赠科研通 6348742
什么是DOI,文献DOI怎么找? 3072513
关于科研通互助平台的介绍 2106129
邀请新用户注册赠送积分活动 2049636