Large Language Models for Diagnosing Focal Liver Lesions From CT/MRI Reports: A Comparative Study With Radiologists

医学诊断 组织病理学 回顾性队列研究 医学 鉴别诊断 放射科 磁共振成像 病理
作者
Liuji Sheng,Yidi Chen,Hong Wei,Feng Che,Yingyi Wu,Qin Qin,Chongtu Yang,Yanshu Wang,Jingwen Peng,Mustafa R. Bashir,Maxime Ronot,Bin Song,Hanyu Jiang
出处
期刊:Liver International [Wiley]
卷期号:45 (6): e70115-e70115 被引量:15
标识
DOI:10.1111/liv.70115
摘要

BACKGROUND & AIMS: Whether large language models (LLMs) could be integrated into the diagnostic workflow of focal liver lesions (FLLs) remains unclear. We aimed to investigate two generic LLMs (ChatGPT-4o and Gemini) regarding their diagnostic accuracies referring to the CT/MRI reports, compared to and combined with radiologists of different experience levels. METHODS: From April 2022 to April 2024, this single-center retrospective study included consecutive adult patients who underwent contrast-enhanced CT/MRI for single FLL and subsequent histopathologic examination. The LLMs were prompted by clinical information and the "findings" section of radiology reports three times to provide differential diagnoses in the descending order of likelihood, with the first considered the final diagnosis. In the research setting, six radiologists (three junior and three middle-level) independently reviewed the CT/MRI images and clinical information in two rounds (first alone, then with LLM assistance). In the clinical setting, diagnoses were retrieved from the "impressions" section of radiology reports. Diagnostic accuracy was investigated against histopathology. RESULTS: 228 patients (median age, 59 years; 155 males) with 228 FLLs (median size, 3.6 cm) were included. Regarding the final diagnosis, the accuracy of two-step ChatGPT-4o (78.9%) was higher than single-step ChatGPT-4o (68.0%, p < 0.001) and single-step Gemini (73.2%, p = 0.004), similar to real-world radiology reports (80.0%, p = 0.34) and junior radiologists (78.9%-82.0%; p-values, 0.21 to > 0.99), but lower than middle-level radiologists (84.6%-85.5%; p-values, 0.001 to 0.02). No incremental diagnostic value of ChatGPT-4o was observed for any radiologist (p-values, 0.63 to > 0.99). CONCLUSION: Two-step ChatGPT-4o showed matching accuracies to real-world radiology reports and junior radiologists for diagnosing FLLs but was less accurate than middle-level radiologists and demonstrated little incremental diagnostic value.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
芸笙完成签到,获得积分10
刚刚
wsyiming完成签到,获得积分10
刚刚
怂怂的哈哈怪完成签到,获得积分20
1秒前
wushangyu发布了新的文献求助10
1秒前
星星完成签到,获得积分10
2秒前
Criminology34应助lucygaga采纳,获得10
3秒前
More应助清逸之风采纳,获得10
3秒前
5秒前
华仔应助ybk666采纳,获得10
6秒前
初景发布了新的文献求助10
7秒前
大个应助wushangyu采纳,获得10
7秒前
10秒前
11秒前
zzzzzzz发布了新的文献求助10
11秒前
复成完成签到 ,获得积分10
13秒前
唐tang完成签到,获得积分10
13秒前
科研通AI6.4应助超子采纳,获得10
14秒前
西山菩提发布了新的文献求助30
15秒前
HUI发布了新的文献求助10
16秒前
17发布了新的文献求助10
18秒前
Semy应助白鹭思一骋采纳,获得50
18秒前
ZJ完成签到,获得积分10
19秒前
19秒前
19秒前
星辰大海应助临风采纳,获得30
21秒前
22秒前
Criminology34应助lucygaga采纳,获得10
22秒前
酶烦劳完成签到,获得积分10
23秒前
蛋蛋发布了新的文献求助10
25秒前
sube完成签到,获得积分10
26秒前
yun完成签到,获得积分20
27秒前
wushangyu发布了新的文献求助10
28秒前
MMM完成签到,获得积分10
29秒前
sube发布了新的文献求助10
30秒前
852应助li采纳,获得10
30秒前
32秒前
wenjian完成签到,获得积分10
33秒前
临风完成签到,获得积分10
34秒前
35秒前
yoyo完成签到,获得积分10
36秒前
高分求助中
Invited Discussant 63O and 64O 1000
Ideology and Meaning-Making under the Putin Regime 750
Petrology and Plate Tectonics 500
A Handbook of User Experience Research & Design in Libraries 400
Understanding Modeling and Simulation of Polymerization Reactions 400
Direct and Iterative Linear System Solvers 400
《KNN基无铅压电陶瓷电学性能优化与物理机理研究》 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 计算机科学 化学工程 生物化学 物理 内科学 复合材料 催化作用 光电子学 物理化学 电极 细胞生物学 基因 遗传学
热门帖子
关注 科研通微信公众号,转发送积分 6904466
求助须知:如何正确求助?哪些是违规求助? 8598240
关于积分的说明 18252912
捐赠科研通 6307120
什么是DOI,文献DOI怎么找? 3063569
关于科研通互助平台的介绍 2085990
邀请新用户注册赠送积分活动 2041366