Application of AI Chatbot in Responding to Asynchronous Text-Based Messages From Patients With Cancer: Comparative Study

聊天机器人 威尔科克森符号秩检验 范畴变量 远程医疗 电子健康 对话 医学 序数回归 描述性统计 考试(生物学) 病历 家庭医学 计算机科学 医学教育 医疗保健 心理学 人工智能 机器学习 统计 内科学 经济 古生物学 生物 经济增长 沟通 数学 曼惠特尼U检验
作者
X. X. Bai,Shiyong Wang,Yuanli Zhao,Ming Fei Feng,Wenbin Ma,Xiaomin Liu
出处
期刊:Journal of Medical Internet Research [JMIR Publications]
卷期号:27: e67462-e67462
标识
DOI:10.2196/67462
摘要

Background Telemedicine, which incorporates artificial intelligence such as chatbots, offers significant potential for enhancing health care delivery. However, the efficacy of artificial intelligence chatbots compared to human physicians in clinical settings remains underexplored, particularly in complex scenarios involving patients with cancer and asynchronous text-based interactions. Objective This study aimed to evaluate the performance of the GPT-4 (OpenAI) chatbot in responding to asynchronous text-based medical messages from patients with cancer by comparing its responses with those of physicians across two clinical scenarios: patient education and medical decision-making. Methods We collected 4257 deidentified asynchronous text-based medical consultation records from 17 oncologists across China between January 1, 2020, and March 31, 2024. Each record included patient questions, demographic data, and disease-related details. The records were categorized into two scenarios: patient education (eg, symptom explanations and test interpretations) and medical decision-making (eg, treatment planning). The GPT-4 chatbot was used to simulate physician responses to these records, with each session conducted in a new conversation to avoid cross-session interference. The chatbot responses, along with the original physician responses, were evaluated by a medical review panel (3 oncologists) and a patient panel (20 patients with cancer). The medical panel assessed completeness, accuracy, and safety using a 3-level scale, whereas the patient panel rated completeness, trustworthiness, and empathy on a 5-point ordinal scale. Statistical analyses included chi-square tests for categorical variables and Wilcoxon signed-rank tests for ordinal ratings. Results In the patient education scenario (n=2364), the chatbot scored higher than physicians in completeness (n=2301, 97.34% vs n=2213, 93.61% for fully complete responses; P=.002), with no significant differences in accuracy or safety (P>.05). In the medical decision-making scenario (n=1893), the chatbot exhibited lower accuracy (n=1834, 96.88% vs n=1855, 97.99% for fully accurate responses; P<.001) and trustworthiness (n=860, 50.71% vs n=1766, 93.29% rated as “Moderately trustworthy” or higher; P<.001) compared with physicians. Regarding empathy, the medical review panel rated the chatbot as demonstrating higher empathy scores across both scenarios, whereas the patient review panel reached the opposite conclusion, consistently favoring physicians in empathetic communication. Errors in chatbot responses were primarily due to misinterpretations of medical terminology or the lack of updated guidelines, with 3.12% (59/1893) of its responses potentially leading to adverse outcomes, compared with 2.01% (38/1893) for physicians. Conclusions The GPT-4 chatbot performs comparably to physicians in patient education by providing comprehensive and empathetic responses. However, its reliability in medical decision-making remains limited, particularly in complex scenarios requiring nuanced clinical judgment. These findings underscore the chatbot’s potential as a supplementary tool in telemedicine while highlighting the need for physician oversight to ensure patient safety and accuracy.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
wnll完成签到,获得积分10
1秒前
1秒前
合适的寄灵完成签到 ,获得积分10
3秒前
caoyulongchn完成签到,获得积分10
3秒前
郝老头完成签到,获得积分10
3秒前
kidd瑞完成签到,获得积分10
3秒前
ranj完成签到,获得积分10
5秒前
siu完成签到 ,获得积分10
5秒前
wnll发布了新的文献求助10
5秒前
Emma完成签到,获得积分10
5秒前
一一完成签到,获得积分10
9秒前
齐嘉懿完成签到,获得积分10
9秒前
小孟吖完成签到 ,获得积分10
11秒前
14秒前
科研执修完成签到,获得积分10
15秒前
JUAN完成签到,获得积分10
16秒前
科研搬运工完成签到,获得积分10
16秒前
祝你勇敢完成签到 ,获得积分10
17秒前
虞小渔完成签到,获得积分10
18秒前
hyw完成签到,获得积分10
20秒前
科研通AI2S应助为治采纳,获得30
20秒前
手帕很忙完成签到,获得积分10
20秒前
糊涂的服饰完成签到,获得积分10
20秒前
Jieh完成签到,获得积分10
21秒前
魔幻的醉柳完成签到,获得积分10
23秒前
秋思冬念完成签到 ,获得积分10
23秒前
chuzihang完成签到 ,获得积分10
26秒前
干净思远完成签到,获得积分10
26秒前
穿山的百足公主完成签到,获得积分10
28秒前
yongziwu完成签到,获得积分10
28秒前
kuyi完成签到 ,获得积分10
29秒前
猛男完成签到,获得积分10
29秒前
sseekker完成签到 ,获得积分10
29秒前
30秒前
wanjingwan完成签到 ,获得积分10
35秒前
苑世朝完成签到,获得积分10
35秒前
刚子完成签到 ,获得积分10
36秒前
zzu123456发布了新的文献求助10
36秒前
韧迹完成签到 ,获得积分10
37秒前
38秒前
高分求助中
Les Mantodea de Guyane: Insecta, Polyneoptera [The Mantids of French Guiana] 2500
Future Approaches to Electrochemical Sensing of Neurotransmitters 1000
Electron microscopy study of magnesium hydride (MgH2) for Hydrogen Storage 1000
Finite Groups: An Introduction 800
壮语核心名词的语言地图及解释 700
ВЕРНЫЙ ДРУГ КИТАЙСКОГО НАРОДА СЕРГЕЙ ПОЛЕВОЙ 500
ВОЗОБНОВЛЕН ВЫПУСК ЖУРНАЛА "КИТАЙ" НА РУССКОМ ЯЗЫКЕ 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3907032
求助须知:如何正确求助?哪些是违规求助? 3452408
关于积分的说明 10870299
捐赠科研通 3178303
什么是DOI,文献DOI怎么找? 1755892
邀请新用户注册赠送积分活动 849170
科研通“疑难数据库(出版商)”最低求助积分说明 791387