医学
超声科
放射科
甲状腺
医学物理学
短信
内科学
万维网
计算机科学
作者
Huan Jiang,Shujun Xia,Yixuan Yang,Jiale Xu,Qing Hua,Zihan Mei,Yiqing Hou,Minyan Wei,Limei Lai,Ning Li,Yijie Dong,JianQiao Zhou
标识
DOI:10.1016/j.ejrad.2024.111458
摘要
Purpose The importance of structured radiology reports has been fully recognized, as they facilitate efficient data extraction and promote collaboration among healthcare professionals. Our purpose is to assess the accuracy and reproducibility of ChatGPT, a large language model, in generating structured thyroid ultrasound reports. Methods This is a retrospective study that includes 184 nodules in 136 thyroid ultrasound reports from 136 patients. ChatGPT-3.5 and ChatGPT-4.0 were used to structure the reports based on ACR-TIRADS guidelines. Two radiologists evaluated the responses for quality, nodule categorization accuracy, and management recommendations. Each text was submitted twice to assess the consistency of the nodule classification and management recommendations. Results On 136 ultrasound reports from 136 patients (mean age, 52 years ± 12 [SD]; 61 male), ChatGPT-3.5 generated 202 satisfactory structured reports, while ChatGPT-4.0 only produced 69 satisfactory structured reports (74.3 % vs. 25.4 %, odds ratio (OR) = 8.490, 95 %CI: 5.775–12.481, p < 0.001). ChatGPT-4.0 outperformed ChatGPT-3.5 in categorizing thyroid nodules, with an accuracy of 69.3 % compared to 34.5 % (OR = 4.282, 95 %CI: 3.145–5.831, p < 0.001). ChatGPT-4.0 also provided more comprehensive or correct management recommendations than ChatGPT-3.5 (OR = 1.791, 95 %CI: 1.297–2.473, p < 0.001). Finally, ChatGPT-4.0 exhibits higher consistency in categorizing nodules compared to ChatGPT-3.5 (ICC = 0.732 vs. ICC = 0.429), and both exhibited moderate consistency in management recommendations (ICC = 0.549 vs ICC = 0.575). Conclusions Our study demonstrates the potential of ChatGPT in transforming free-text thyroid ultrasound reports into structured formats. ChatGPT-3.5 excels in generating structured reports, while ChatGPT-4.0 shows superior accuracy in nodule categorization and management recommendations.
科研通智能强力驱动
Strongly Powered by AbleSci AI