ChatGPT-4: An assessment of an upgraded artificial intelligence chatbot in the United States Medical Licensing Examination

执照 美国医学执照考试 医学教育 多项选择 医学 考试(生物学) 家庭医学 心理学 医学院 显著性差异 内科学 生物 古生物学
作者
Andrew Mihalache,Ryan S. Huang,Marko Popović,Rajeev H. Muni
出处
期刊:Medical Teacher [Taylor & Francis]
卷期号:46 (3): 366-372 被引量:74
标识
DOI:10.1080/0142159x.2023.2249588
摘要

AbstractPurpose: ChatGPT-4 is an upgraded version of an artificial intelligence chatbot. The performance of ChatGPT-4 on the United States Medical Licensing Examination (USMLE) has not been independently characterized. We aimed to assess the performance of ChatGPT-4 at responding to USMLE Step 1, Step 2CK, and Step 3 practice questions.Method: Practice multiple-choice questions for the USMLE Step 1, Step 2CK, and Step 3 were compiled. Of 376 available questions, 319 (85%) were analyzed by ChatGPT-4 on March 21st, 2023. Our primary outcome was the performance of ChatGPT-4 for the practice USMLE Step 1, Step 2CK, and Step 3 examinations, measured as the proportion of multiple-choice questions answered correctly. Our secondary outcomes were the mean length of questions and responses provided by ChatGPT-4.Results: ChatGPT-4 responded to 319 text-based multiple-choice questions from USMLE practice test material. ChatGPT-4 answered 82 of 93 (88%) questions correctly on USMLE Step 1, 91 of 106 (86%) on Step 2CK, and 108 of 120 (90%) on Step 3. ChatGPT-4 provided explanations for all questions. ChatGPT-4 spent 30.8 ± 11.8 s on average responding to practice questions for USMLE Step 1, 23.0 ± 9.4 s per question for Step 2CK, and 23.1 ± 8.3 s per question for Step 3. The mean length of practice USMLE multiple-choice questions that were answered correctly and incorrectly by ChatGPT-4 was similar (difference = 17.48 characters, SE = 59.75, 95%CI = [-100.09,135.04], t = 0.29, p = 0.77). The mean length of ChatGPT-4's correct responses to practice questions was significantly shorter than the mean length of incorrect responses (difference = 79.58 characters, SE = 35.42, 95%CI = [9.89,149.28], t = 2.25, p = 0.03).Conclusions: ChatGPT-4 answered a remarkably high proportion of practice questions correctly for USMLE examinations. ChatGPT-4 performed substantially better at USMLE practice questions than previous models of the same AI chatbot.Keywords: artificial intelligencenatural language processingUnited States medical licensing examinationchatgpt-4 Disclosure statementThe views expressed herein are those of the authors and do not necessarily reflect the position of the Federation of State Medical Boards or National Board of Medical Examiners. Information reported in this manuscript has not been previously presented at a conference. Data were collected from the artificial intelligence chatbot ChatGPT developed by OpenAI. As corresponding author, Rajeev H. Muni had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.Data availability statementThe data that support the findings of this study may be requested at andrew.mihalache@mail.utoronto.ca, with support from the principal investigator RHM.Additional informationFundingMMP: Financial support (to institution) – PSI Foundation, Fighting Blindness Canada. RHM: Consultant – Alcon, Apellis, AbbVie, Bayer, Bausch Health, Roche; Financial Support (to institution) – Alcon, AbbVie, Bayer, Novartis, Roche. Notes on contributorsAndrew MihalacheAndrew Mihalache is a MD candidate at the University of Toronto in Toronto, Ontario under the Temerty Faculty of Medicine.Ryan S. HuangRyan S. Huang is a MD candidate at the University of Toronto in Toronto, Ontario, under the Temerty Faculty of Medicine.Marko M. PopovicMarko M. Popovic is the Chief Ophthalmology Resident in the Department of Ophthalmology and Vision Sciences at the University of Toronto and has completed a Master of Public Health at the Harvard T.H. Chan School of Public Health.Rajeev H. MuniRajeev H. Muni is a staff vitreoretinal surgeon at St. Michael's Hospital in Toronto, Ontario, Associate Professor and Vice-Chair of Clinical Research in the Department of Ophthalmology and Vision Sciences at the University of Toronto.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
素简完成签到,获得积分20
1秒前
B_lue完成签到 ,获得积分10
1秒前
FFFFFFG完成签到,获得积分10
2秒前
2秒前
HAO完成签到,获得积分10
2秒前
西哥完成签到,获得积分10
3秒前
AA完成签到,获得积分10
4秒前
黄药师完成签到,获得积分10
4秒前
chen完成签到,获得积分10
4秒前
tanx完成签到,获得积分10
5秒前
小慧完成签到 ,获得积分10
5秒前
寒冷丹雪完成签到,获得积分10
5秒前
我谈完成签到,获得积分10
6秒前
Sissi完成签到,获得积分10
7秒前
fanfan完成签到 ,获得积分10
7秒前
俊卿发布了新的文献求助10
7秒前
跳跃靖完成签到,获得积分20
7秒前
HHHu完成签到,获得积分10
7秒前
可爱的函函应助迅速的岩采纳,获得10
8秒前
蓝韵完成签到,获得积分10
10秒前
hdc12138完成签到,获得积分10
10秒前
顾矜应助galioo3000采纳,获得200
11秒前
zhaolee完成签到 ,获得积分10
13秒前
古叶完成签到,获得积分10
13秒前
Wendy完成签到,获得积分10
13秒前
13秒前
今后应助小鞋采纳,获得10
14秒前
杨建航完成签到,获得积分10
17秒前
朴素凡阳完成签到,获得积分10
17秒前
jhxie完成签到,获得积分10
17秒前
19秒前
20秒前
云遮月完成签到,获得积分10
21秒前
儿学化学打断腿完成签到,获得积分10
22秒前
任迷迷完成签到 ,获得积分10
23秒前
wei发布了新的文献求助10
23秒前
鱼儿完成签到,获得积分10
24秒前
高贵宛海完成签到,获得积分10
25秒前
galioo3000发布了新的文献求助200
26秒前
回来完成签到,获得积分10
26秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Picture this! Including first nations fiction picture books in school library collections 2000
The Cambridge History of China: Volume 4, Sui and T'ang China, 589–906 AD, Part Two 1500
Cowries - A Guide to the Gastropod Family Cypraeidae 1200
ON THE THEORY OF BIRATIONAL BLOWING-UP 666
Signals, Systems, and Signal Processing 610
Pulse width control of a 3-phase inverter with non sinusoidal phase voltages 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6389519
求助须知:如何正确求助?哪些是违规求助? 8204517
关于积分的说明 17359586
捐赠科研通 5443204
什么是DOI,文献DOI怎么找? 2878206
邀请新用户注册赠送积分活动 1854461
关于科研通互助平台的介绍 1698100