Comparison of Large Language Models in Answering Immuno-Oncology Questions: A Cross-Sectional Study

可读性 医学 肿瘤科 心理学 医学教育 内科学 家庭医学 计算机科学 程序设计语言
作者
Giovanni Maria Iannantuono,Dara Bracken-Clarke,Fatima Karzai,Hyoyoung Choo‐Wosoba,James L. Gulley,Charalampos S. Floudas
出处
期刊:Oncologist [AlphaMed Press]
卷期号:29 (5): 407-414 被引量:14
标识
DOI:10.1093/oncolo/oyae009
摘要

Abstract Background The capability of large language models (LLMs) to understand and generate human-readable text has prompted the investigation of their potential as educational and management tools for patients with cancer and healthcare providers. Materials and Methods We conducted a cross-sectional study aimed at evaluating the ability of ChatGPT-4, ChatGPT-3.5, and Google Bard to answer questions related to 4 domains of immuno-oncology (Mechanisms, Indications, Toxicities, and Prognosis). We generated 60 open-ended questions (15 for each section). Questions were manually submitted to LLMs, and responses were collected on June 30, 2023. Two reviewers evaluated the answers independently. Results ChatGPT-4 and ChatGPT-3.5 answered all questions, whereas Google Bard answered only 53.3% (P < .0001). The number of questions with reproducible answers was higher for ChatGPT-4 (95%) and ChatGPT3.5 (88.3%) than for Google Bard (50%) (P < .0001). In terms of accuracy, the number of answers deemed fully correct were 75.4%, 58.5%, and 43.8% for ChatGPT-4, ChatGPT-3.5, and Google Bard, respectively (P = .03). Furthermore, the number of responses deemed highly relevant was 71.9%, 77.4%, and 43.8% for ChatGPT-4, ChatGPT-3.5, and Google Bard, respectively (P = .04). Regarding readability, the number of highly readable was higher for ChatGPT-4 and ChatGPT-3.5 (98.1%) and (100%) compared to Google Bard (87.5%) (P = .02). Conclusion ChatGPT-4 and ChatGPT-3.5 are potentially powerful tools in immuno-oncology, whereas Google Bard demonstrated relatively poorer performance. However, the risk of inaccuracy or incompleteness in the responses was evident in all 3 LLMs, highlighting the importance of expert-driven verification of the outputs returned by these technologies.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
2秒前
QR发布了新的文献求助10
2秒前
寂寞的威完成签到,获得积分10
4秒前
未改完成签到,获得积分10
4秒前
meatball1982完成签到,获得积分10
5秒前
8秒前
完美世界应助皮皮采纳,获得10
8秒前
wuw发布了新的文献求助10
11秒前
13秒前
Charley完成签到,获得积分20
14秒前
16秒前
18秒前
Charley发布了新的文献求助10
18秒前
追寻茗发布了新的文献求助10
19秒前
21秒前
愫浅完成签到 ,获得积分10
21秒前
夏一苒完成签到,获得积分20
22秒前
科研通AI5应助张张采纳,获得10
24秒前
26秒前
Leslie完成签到,获得积分10
30秒前
30秒前
谦让的西装完成签到 ,获得积分10
31秒前
31秒前
追寻茗完成签到,获得积分10
32秒前
wy1693207859完成签到,获得积分10
33秒前
haipronl发布了新的文献求助10
33秒前
gry发布了新的文献求助10
36秒前
gry完成签到,获得积分10
42秒前
科研通AI5应助yanier采纳,获得10
42秒前
失眠的板栗完成签到,获得积分10
43秒前
43秒前
44秒前
领导范儿应助YGTRECE采纳,获得10
46秒前
47秒前
ZZZ完成签到 ,获得积分10
48秒前
48秒前
48秒前
52秒前
springu发布了新的文献求助10
53秒前
简珹楚完成签到 ,获得积分10
53秒前
高分求助中
【此为提示信息,请勿应助】请按要求发布求助,避免被关 20000
Continuum Thermodynamics and Material Modelling 2000
Encyclopedia of Geology (2nd Edition) 2000
105th Edition CRC Handbook of Chemistry and Physics 1600
Maneuvering of a Damaged Navy Combatant 650
Mixing the elements of mass customisation 300
the MD Anderson Surgical Oncology Manual, Seventh Edition 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3778047
求助须知:如何正确求助?哪些是违规求助? 3323723
关于积分的说明 10215564
捐赠科研通 3038918
什么是DOI,文献DOI怎么找? 1667711
邀请新用户注册赠送积分活动 798351
科研通“疑难数据库(出版商)”最低求助积分说明 758339