Feasibility of Using the Privacy-preserving Large Language Model Vicuna for Labeling Radiology Reports

医学集合（抽象数据类型）回顾性队列研究接收机工作特性射线照相术机器学习放射科计算机科学病理内科学程序设计语言

作者

Pritam Mukherjee,Benjamin Hou,Ricardo Bigolin Lanfredi,Ronald M. Summers

出处

期刊：Radiology [Radiological Society of North America]
日期：2023-10-01 卷期号：309 (1) 被引量：29

链接

nih.govdoi.org

标识

DOI：10.1148/radiol.231147

摘要

Background Large language models (LLMs) such as ChatGPT, though proficient in many text-based tasks, are not suitable for use with radiology reports due to patient privacy constraints. Purpose To test the feasibility of using an alternative LLM (Vicuna-13B) that can be run locally for labeling radiography reports. Materials and Methods Chest radiography reports from the MIMIC-CXR and National Institutes of Health (NIH) data sets were included in this retrospective study. Reports were examined for 13 findings. Outputs reporting the presence or absence of the 13 findings were generated by Vicuna by using a single-step or multistep prompting strategy (prompts 1 and 2, respectively). Agreements between Vicuna outputs and CheXpert and CheXbert labelers were assessed using Fleiss κ. Agreement between Vicuna outputs from three runs under a hyperparameter setting that introduced some randomness (temperature, 0.7) was also assessed. The performance of Vicuna and the labelers was assessed in a subset of 100 NIH reports annotated by a radiologist with use of area under the receiver operating characteristic curve (AUC). Results A total of 3269 reports from the MIMIC-CXR data set (median patient age, 68 years [IQR, 59–79 years]; 161 male patients) and 25 596 reports from the NIH data set (median patient age, 47 years [IQR, 32–58 years]; 1557 male patients) were included. Vicuna outputs with prompt 2 showed, on average, moderate to substantial agreement with the labelers on the MIMIC-CXR (κ median, 0.57 [IQR, 0.45–0.66] with CheXpert and 0.64 [IQR, 0.45–0.68] with CheXbert) and NIH (κ median, 0.52 [IQR, 0.41–0.65] with CheXpert and 0.55 [IQR, 0.41–0.74] with CheXbert) data sets, respectively. Vicuna with prompt 2 performed at par (median AUC, 0.84 [IQR, 0.74–0.93]) with both labelers on nine of 11 findings. Conclusion In this proof-of-concept study, outputs of the LLM Vicuna reporting the presence or absence of 13 findings on chest radiography reports showed moderate to substantial agreement with existing labelers. © RSNA, 2023 Supplemental material is available for this article. See also the editorial by Cai in this issue.

求助该文献

最长约 10秒，即可获得该文献文件

Feasibility of Using the Privacy-preserving Large Language Model Vicuna for Labeling Radiology Reports

今日热心研友