Privacy-ensuring Open-weights Large Language Models Are Competitive with Closed-weights GPT-4o in Extracting Chest Radiography Findings from Free-Text Reports

医学 射线照相术 放射科
作者
Sebastian Nowak,Benjamin Wulff,Yannik C. Layer,Maike Theis,Alexander Isaak,Babak Salam,Wolfgang Block,Daniel Kuetting,Claus C. Pieper,Julian A. Luetkens,Ulrike Attenberger,Alois M. Sprinkart
出处
期刊:Radiology [Radiological Society of North America]
卷期号:314 (1)
标识
DOI:10.1148/radiol.240895
摘要

Background Large-scale secondary use of clinical databases requires automated tools for retrospective extraction of structured content from free-text radiology reports. Purpose To share data and insights on the application of privacy-preserving open-weights large language models (LLMs) for reporting content extraction with comparison to standard rule-based systems and the closed-weights LLMs from OpenAI. Materials and Methods In this retrospective exploratory study conducted between May 2024 and September 2024, zero-shot prompting of 17 open-weights LLMs was preformed. These LLMs with model weights released under open licenses were compared with rule-based annotation and with OpenAI's GPT-4o, GPT-4o-mini, GPT-4-turbo, and GPT-3.5-turbo on a manually annotated public English chest radiography dataset (Indiana University, 3927 patients and reports). An annotated nonpublic German chest radiography dataset (18 500 reports, 16 844 patients [10 340 male; mean age, 62.6 years ± 21.5 {SD}]) was used to compare local fine-tuning of all open-weights LLMs via low-rank adaptation and 4-bit quantization to bidirectional encoder representations from transformers (BERT) with different subsets of reports (from 10 to 14 580). Nonoverlapping 95% CIs of macro-averaged F1 scores were defined as relevant differences. Results For the English reports, the highest zero-shot macro-averaged F1 score was observed for GPT-4o (92.4% [95% CI: 87.9, 95.9]); GPT-4o outperformed the rule-based CheXpert [Stanford University] (73.1% [95% CI: 65.1, 79.7]) but was comparable in performance to several open-weights LLMs (top three: Mistral-Large [Mistral AI], 92.6% [95% CI: 88.2, 96.0]; Llama-3.1-70b [Meta AI], 92.2% [95% CI: 87.1, 95.8]; and Llama-3.1-405b [Meta AI]: 90.3% [95% CI: 84.6, 94.5]). For the German reports, Mistral-Large (91.6% [95% CI: 90.5, 92.7]) had the highest zero-shot macro-averaged F1 score among the six other open-weights LLMs and outperformed the rule-based annotation (74.8% [95% CI: 73.3, 76.1]). Using 1000 reports for fine-tuning, all LLMs (top three: Mistral-Large, 94.3% [95% CI: 93.5, 95.2]; OpenBioLLM-70b [Saama]: 93.9% [95% CI: 92.9, 94.8]; and Mixtral-8×22b [Mistral AI]: 93.8% [95% CI: 92.8, 94.7]) achieved significantly higher macro-averaged F1 score than did BERT (86.7% [95% CI: 85.0, 88.3]); however, the differences were not relevant when 2000 or more reports were used for fine-tuning. Conclusion LLMs have the potential to outperform rule-based systems for zero-shot "out-of-the-box" structuring of report databases, with privacy-ensuring open-weights LLMs being competitive with closed-weights GPT-4o. Additionally, the open-weights LLM outperformed BERT when moderate numbers of reports were used for fine-tuning. Published under a CC BY 4.0 license. Supplemental material is available for this article. See also the editorial by Gee and Yao in this issue.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
小蘑菇应助3542002采纳,获得10
1秒前
纯真皮卡丘完成签到 ,获得积分10
1秒前
海拾月发布了新的文献求助30
3秒前
有魅力的若菱关注了科研通微信公众号
4秒前
慈祥的鸣凤完成签到 ,获得积分10
6秒前
7秒前
9秒前
10秒前
栗子完成签到,获得积分10
11秒前
duolaAmeng完成签到,获得积分10
13秒前
13秒前
Bronya发布了新的文献求助10
13秒前
15秒前
搜集达人应助开心雨采纳,获得10
15秒前
科研通AI5应助yyymmma采纳,获得10
18秒前
深情安青应助钰LM采纳,获得10
18秒前
19秒前
前进大佬发布了新的文献求助10
20秒前
今后应助新威宝贝采纳,获得10
22秒前
25秒前
25秒前
热心市民应助缓慢的如波采纳,获得20
29秒前
30秒前
爆米花应助小郭子采纳,获得10
30秒前
钰LM发布了新的文献求助10
31秒前
两腿一蹬与世无争完成签到,获得积分10
31秒前
在学海中挣扎完成签到 ,获得积分10
31秒前
李123456关注了科研通微信公众号
34秒前
憨寒完成签到,获得积分10
35秒前
666完成签到,获得积分20
36秒前
科研乞丐发布了新的文献求助10
38秒前
莱芙完成签到 ,获得积分10
38秒前
foxdaopo完成签到,获得积分10
39秒前
mylaodao完成签到,获得积分0
39秒前
40秒前
wwx完成签到,获得积分10
42秒前
李123456发布了新的文献求助10
42秒前
好的完成签到,获得积分10
42秒前
乌梅橘子茶完成签到,获得积分10
45秒前
丘比特应助yls采纳,获得10
45秒前
高分求助中
Encyclopedia of Mathematical Physics 2nd edition 888
Introduction to Strong Mixing Conditions Volumes 1-3 500
Tip60 complex regulates eggshell formation and oviposition in the white-backed planthopper, providing effective targets for pest control 400
Optical and electric properties of monocrystalline synthetic diamond irradiated by neutrons 320
共融服務學習指南 300
Essentials of Pharmacoeconomics: Health Economics and Outcomes Research 3rd Edition. by Karen Rascati 300
Peking Blues // Liao San 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3801436
求助须知:如何正确求助?哪些是违规求助? 3347178
关于积分的说明 10332279
捐赠科研通 3063465
什么是DOI,文献DOI怎么找? 1681729
邀请新用户注册赠送积分活动 807670
科研通“疑难数据库(出版商)”最低求助积分说明 763852