Large Language Models for Diagnosing Focal Liver Lesions From CT/MRI Reports: A Comparative Study With Radiologists

医学诊断 组织病理学 回顾性队列研究 医学 鉴别诊断 放射科 磁共振成像 病理
作者
Liuji Sheng,Yidi Chen,Hong Wei,Feng Che,Yingyi Wu,Qin Qin,Chongtu Yang,Yanshu Wang,Jingwen Peng,Mustafa R. Bashir,Maxime Ronot,Bin Song,Hanyu Jiang
出处
期刊:Liver International [Wiley]
卷期号:45 (6): e70115-e70115 被引量:15
标识
DOI:10.1111/liv.70115
摘要

BACKGROUND & AIMS: Whether large language models (LLMs) could be integrated into the diagnostic workflow of focal liver lesions (FLLs) remains unclear. We aimed to investigate two generic LLMs (ChatGPT-4o and Gemini) regarding their diagnostic accuracies referring to the CT/MRI reports, compared to and combined with radiologists of different experience levels. METHODS: From April 2022 to April 2024, this single-center retrospective study included consecutive adult patients who underwent contrast-enhanced CT/MRI for single FLL and subsequent histopathologic examination. The LLMs were prompted by clinical information and the "findings" section of radiology reports three times to provide differential diagnoses in the descending order of likelihood, with the first considered the final diagnosis. In the research setting, six radiologists (three junior and three middle-level) independently reviewed the CT/MRI images and clinical information in two rounds (first alone, then with LLM assistance). In the clinical setting, diagnoses were retrieved from the "impressions" section of radiology reports. Diagnostic accuracy was investigated against histopathology. RESULTS: 228 patients (median age, 59 years; 155 males) with 228 FLLs (median size, 3.6 cm) were included. Regarding the final diagnosis, the accuracy of two-step ChatGPT-4o (78.9%) was higher than single-step ChatGPT-4o (68.0%, p < 0.001) and single-step Gemini (73.2%, p = 0.004), similar to real-world radiology reports (80.0%, p = 0.34) and junior radiologists (78.9%-82.0%; p-values, 0.21 to > 0.99), but lower than middle-level radiologists (84.6%-85.5%; p-values, 0.001 to 0.02). No incremental diagnostic value of ChatGPT-4o was observed for any radiologist (p-values, 0.63 to > 0.99). CONCLUSION: Two-step ChatGPT-4o showed matching accuracies to real-world radiology reports and junior radiologists for diagnosing FLLs but was less accurate than middle-level radiologists and demonstrated little incremental diagnostic value.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
zhanghaonan完成签到,获得积分10
1秒前
哦萨尔发布了新的文献求助10
2秒前
3秒前
3秒前
3秒前
3秒前
3秒前
3秒前
3秒前
3秒前
3秒前
3秒前
5秒前
6秒前
6秒前
6秒前
6秒前
7秒前
7秒前
7秒前
7秒前
7秒前
7秒前
ol关注了科研通微信公众号
8秒前
snow完成签到 ,获得积分10
8秒前
英姑应助专注的以亦采纳,获得10
9秒前
情怀应助jmy1995采纳,获得10
10秒前
dxx发布了新的文献求助10
10秒前
生动老九完成签到,获得积分10
10秒前
11秒前
老抠发布了新的文献求助10
11秒前
12秒前
12秒前
12秒前
12秒前
12秒前
12秒前
12秒前
12秒前
12秒前
高分求助中
Invited Discussant 63O and 64O 1000
Ideology and Meaning-Making under the Putin Regime 750
Petrology and Plate Tectonics 500
Writing Systems 500
A Handbook of User Experience Research & Design in Libraries 400
Understanding Modeling and Simulation of Polymerization Reactions 400
Direct and Iterative Linear System Solvers 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 计算机科学 化学工程 生物化学 物理 内科学 复合材料 催化作用 光电子学 物理化学 电极 细胞生物学 基因 遗传学
热门帖子
关注 科研通微信公众号,转发送积分 6902834
求助须知:如何正确求助?哪些是违规求助? 8597049
关于积分的说明 18251269
捐赠科研通 6304444
什么是DOI,文献DOI怎么找? 3062942
关于科研通互助平台的介绍 2084652
邀请新用户注册赠送积分活动 2040819