Large Language Models in Summarizing Radiology Report Impressions for Lung Cancer in Chinese: Evaluation Study

逼真 放射科 正确性 计算机科学 医学 医学物理学 人工智能 算法 哲学 认识论
作者
Danqing Hu,Shanyuan Zhang,Qing Liu,Zhu Xiaofeng,Bing Liu
出处
期刊:Journal of Medical Internet Research [JMIR Publications]
卷期号:27: e65547-e65547
标识
DOI:10.2196/65547
摘要

Background Large language models (LLMs), such as ChatGPT, have demonstrated impressive capabilities in various natural language processing tasks, particularly in text generation. However, their effectiveness in summarizing radiology report impressions remains uncertain. Objective This study aims to evaluate the capability of nine LLMs, that is, Tongyi Qianwen, ERNIE Bot, ChatGPT, Bard, Claude, Baichuan, ChatGLM, HuatuoGPT, and ChatGLM-Med, in summarizing Chinese radiology report impressions for lung cancer. Methods We collected 100 Chinese computed tomography (CT), positron emission tomography (PET)–CT, and ultrasound (US) reports each from Peking University Cancer Hospital and Institute. All these reports were from patients with suspected or confirmed lung cancer. Using these reports, we created zero-shot, one-shot, and three-shot prompts with or without complete example reports as inputs to generate impressions. We used both automatic quantitative evaluation metrics and five human evaluation metrics (completeness, correctness, conciseness, verisimilitude, and replaceability) to assess the generated impressions. Two thoracic surgeons (SZ and BL) and one radiologist (QL) compared the generated impressions with reference impressions, scoring them according to the five human evaluation metrics. Results In the automatic quantitative evaluation, ERNIE Bot, Tongyi Qianwen, and Claude demonstrated the best overall performance in generating impressions for CT, PET-CT, and US reports, respectively. In the human semantic evaluation, ERNIE Bot outperformed the other LLMs in terms of conciseness, verisimilitude, and replaceability on CT impression generation, while its completeness and correctness scores were comparable to those of other LLMs. Tongyi Qianwen excelled in PET-CT impression generation, with the highest scores for correctness, conciseness, verisimilitude, and replaceability. Claude achieved the best conciseness, verisimilitude, and replaceability scores on US impression generation, and its completeness and correctness scores are close to the best results obtained by other LLMs. The generated impressions were generally complete and correct but lacked conciseness and verisimilitude. Although one-shot and few-shot prompts improved conciseness and verisimilitude, clinicians noted a significant gap between the generated impressions and those written by radiologists. Conclusions Current LLMs can produce radiology impressions with high completeness and correctness but fall short in conciseness and verisimilitude, indicating they cannot yet fully replace impressions written by radiologists.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
调皮万宝路完成签到,获得积分10
刚刚
犹豫战斗机完成签到,获得积分10
1秒前
2秒前
TW完成签到,获得积分10
2秒前
Beverly发布了新的文献求助10
2秒前
无问东西发布了新的文献求助10
2秒前
笑嘻嘻发布了新的文献求助10
4秒前
鲸落发布了新的文献求助20
4秒前
nunu发布了新的文献求助10
5秒前
科研通AI5应助淡泊宁静采纳,获得10
5秒前
共享精神应助竹园采纳,获得10
6秒前
cis2014发布了新的文献求助10
6秒前
天天快乐应助shuaishuyi采纳,获得10
6秒前
星辰大海应助xyx采纳,获得10
6秒前
风筝与亭完成签到 ,获得积分10
6秒前
pluto应助TW采纳,获得10
6秒前
科研通AI5应助hhm采纳,获得10
7秒前
尊敬秋双完成签到 ,获得积分10
8秒前
8秒前
8秒前
青阳完成签到,获得积分10
9秒前
靎藥完成签到,获得积分10
9秒前
今后应助corner采纳,获得10
9秒前
椰子在长江送礼物应助mini采纳,获得10
9秒前
10秒前
科研通AI5应助cy采纳,获得10
10秒前
小航发布了新的文献求助10
10秒前
biomichael完成签到,获得积分10
11秒前
C15发布了新的文献求助10
13秒前
一念初见发布了新的文献求助10
13秒前
科研通AI5应助爆炸头采纳,获得10
13秒前
隐形曼青应助乐观的鞋垫采纳,获得10
14秒前
14秒前
Ava应助DO采纳,获得10
14秒前
谨慎的哈密瓜完成签到 ,获得积分10
15秒前
16秒前
神勇代荷完成签到,获得积分10
17秒前
17秒前
赖向珊应助风信子deon01采纳,获得200
17秒前
高分求助中
Les Mantodea de Guyane Insecta, Polyneoptera 2500
Mobilization, center-periphery structures and nation-building 600
Technologies supporting mass customization of apparel: A pilot project 450
China—Art—Modernity: A Critical Introduction to Chinese Visual Expression from the Beginning of the Twentieth Century to the Present Day 430
Tip60 complex regulates eggshell formation and oviposition in the white-backed planthopper, providing effective targets for pest control 400
A Field Guide to the Amphibians and Reptiles of Madagascar - Frank Glaw and Miguel Vences - 3rd Edition 400
China Gadabouts: New Frontiers of Humanitarian Nursing, 1941–51 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3793230
求助须知:如何正确求助?哪些是违规求助? 3337971
关于积分的说明 10287780
捐赠科研通 3054528
什么是DOI,文献DOI怎么找? 1675991
邀请新用户注册赠送积分活动 804036
科研通“疑难数据库(出版商)”最低求助积分说明 761715