Potential of GPT-4 for Detecting Errors in Radiology Reports: Implications for Reporting Accuracy

医学 工作量 医学物理学 混乱 放射科 精神分析 心理学 计算机科学 操作系统
作者
Roman Johannes Gertz,Thomas Dratsch,Alexander C. Bunck,Simon Lennartz,Andra-Iza Iuga,Martin Hellmich,Thorsten Persigehl,Lenhard Pennig,Carsten Herbert Gietzen,Philipp Fervers,David Maintz,Robert Hahnfeldt,Jonathan Kottlors
出处
期刊:Radiology [Radiological Society of North America]
卷期号:311 (1) 被引量:41
标识
DOI:10.1148/radiol.232714
摘要

Background Errors in radiology reports may occur because of resident-to-attending discrepancies, speech recognition inaccuracies, and large workload. Large language models, such as GPT-4 (ChatGPT; OpenAI), may assist in generating reports. Purpose To assess effectiveness of GPT-4 in identifying common errors in radiology reports, focusing on performance, time, and cost-efficiency. Materials and Methods In this retrospective study, 200 radiology reports (radiography and cross-sectional imaging [CT and MRI]) were compiled between June 2023 and December 2023 at one institution. There were 150 errors from five common error categories (omission, insertion, spelling, side confusion, and other) intentionally inserted into 100 of the reports and used as the reference standard. Six radiologists (two senior radiologists, two attending physicians, and two residents) and GPT-4 were tasked with detecting these errors. Overall error detection performance, error detection in the five error categories, and reading time were assessed using Wald χ2 tests and paired-sample t tests. Results GPT-4 (detection rate, 82.7%;124 of 150; 95% CI: 75.8, 87.9) matched the average detection performance of radiologists independent of their experience (senior radiologists, 89.3% [134 of 150; 95% CI: 83.4, 93.3]; attending physicians, 80.0% [120 of 150; 95% CI: 72.9, 85.6]; residents, 80.0% [120 of 150; 95% CI: 72.9, 85.6]; P value range, .522–.99). One senior radiologist outperformed GPT-4 (detection rate, 94.7%; 142 of 150; 95% CI: 89.8, 97.3; P = .006). GPT-4 required less processing time per radiology report than the fastest human reader in the study (mean reading time, 3.5 seconds ± 0.5 [SD] vs 25.1 seconds ± 20.1, respectively; P < .001; Cohen d = −1.08). The use of GPT-4 resulted in lower mean correction cost per report than the most cost-efficient radiologist ($0.03 ± 0.01 vs $0.42 ± 0.41; P < .001; Cohen d = −1.12). Conclusion The radiology report error detection rate of GPT-4 was comparable with that of radiologists, potentially reducing work hours and cost. © RSNA, 2024 See also the editorial by Forman in this issue.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
魔幻的土豆泥完成签到,获得积分10
1秒前
冷傲藏鸟发布了新的文献求助10
2秒前
zwzh完成签到,获得积分10
2秒前
3秒前
3秒前
hyominhsu完成签到,获得积分10
3秒前
Liu发布了新的文献求助10
3秒前
3秒前
张亭亭发布了新的文献求助10
4秒前
木子青山完成签到,获得积分10
4秒前
冷傲的强炫完成签到,获得积分20
4秒前
4秒前
WestHoter发布了新的文献求助10
5秒前
hhh发布了新的文献求助10
5秒前
6秒前
Ilan完成签到,获得积分10
6秒前
6秒前
乐乐乐乐乐乐应助蒲云海采纳,获得10
7秒前
科目三应助悠哉采纳,获得10
7秒前
可耐的问柳完成签到 ,获得积分10
7秒前
pomelo完成签到 ,获得积分10
8秒前
星辰大海应助罗是一采纳,获得10
8秒前
gomm完成签到,获得积分10
8秒前
Lucas应助u7iui采纳,获得30
9秒前
qq发布了新的文献求助10
9秒前
9秒前
10秒前
hzhang完成签到,获得积分10
10秒前
方秋完成签到,获得积分10
11秒前
11秒前
Lucas应助西安小小朱采纳,获得10
12秒前
13秒前
利利完成签到,获得积分10
13秒前
13秒前
13秒前
13秒前
路路完成签到,获得积分10
14秒前
夜休2024发布了新的文献求助10
14秒前
14秒前
15秒前
高分求助中
(禁止应助)【重要!!请各位详细阅读】【科研通的精品贴汇总】 10000
International Code of Nomenclature for algae, fungi, and plants (Madrid Code) (Regnum Vegetabile) 1500
Stereoelectronic Effects 1000
Robot-supported joining of reinforcement textiles with one-sided sewing heads 820
Logical form: From GB to Minimalism 500
The Geometry of the Moiré Effect in One, Two, and Three Dimensions 500
含极性四面体硫代硫酸基团的非线性光学晶体的探索 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 冶金 细胞生物学 免疫学
热门帖子
关注 科研通微信公众号,转发送积分 4185425
求助须知:如何正确求助?哪些是违规求助? 3721294
关于积分的说明 11725727
捐赠科研通 3399505
什么是DOI,文献DOI怎么找? 1865229
邀请新用户注册赠送积分活动 922619
科研通“疑难数据库(出版商)”最低求助积分说明 834094