ChatGPT-4: An assessment of an upgraded artificial intelligence chatbot in the United States Medical Licensing Examination

执照 美国医学执照考试 医学教育 多项选择 医学 考试(生物学) 家庭医学 心理学 医学院 显著性差异 内科学 古生物学 生物
作者
Andrew Mihalache,Ryan S. Huang,Marko Popović,Rajeev H. Muni
出处
期刊:Medical Teacher [Informa]
卷期号:: 1-7 被引量:3
标识
DOI:10.1080/0142159x.2023.2249588
摘要

AbstractPurpose: ChatGPT-4 is an upgraded version of an artificial intelligence chatbot. The performance of ChatGPT-4 on the United States Medical Licensing Examination (USMLE) has not been independently characterized. We aimed to assess the performance of ChatGPT-4 at responding to USMLE Step 1, Step 2CK, and Step 3 practice questions.Method: Practice multiple-choice questions for the USMLE Step 1, Step 2CK, and Step 3 were compiled. Of 376 available questions, 319 (85%) were analyzed by ChatGPT-4 on March 21st, 2023. Our primary outcome was the performance of ChatGPT-4 for the practice USMLE Step 1, Step 2CK, and Step 3 examinations, measured as the proportion of multiple-choice questions answered correctly. Our secondary outcomes were the mean length of questions and responses provided by ChatGPT-4.Results: ChatGPT-4 responded to 319 text-based multiple-choice questions from USMLE practice test material. ChatGPT-4 answered 82 of 93 (88%) questions correctly on USMLE Step 1, 91 of 106 (86%) on Step 2CK, and 108 of 120 (90%) on Step 3. ChatGPT-4 provided explanations for all questions. ChatGPT-4 spent 30.8 ± 11.8 s on average responding to practice questions for USMLE Step 1, 23.0 ± 9.4 s per question for Step 2CK, and 23.1 ± 8.3 s per question for Step 3. The mean length of practice USMLE multiple-choice questions that were answered correctly and incorrectly by ChatGPT-4 was similar (difference = 17.48 characters, SE = 59.75, 95%CI = [-100.09,135.04], t = 0.29, p = 0.77). The mean length of ChatGPT-4’s correct responses to practice questions was significantly shorter than the mean length of incorrect responses (difference = 79.58 characters, SE = 35.42, 95%CI = [9.89,149.28], t = 2.25, p = 0.03).Conclusions: ChatGPT-4 answered a remarkably high proportion of practice questions correctly for USMLE examinations. ChatGPT-4 performed substantially better at USMLE practice questions than previous models of the same AI chatbot.Keywords: artificial intelligencenatural language processingUnited States medical licensing examinationchatgpt-4 Disclosure statementThe views expressed herein are those of the authors and do not necessarily reflect the position of the Federation of State Medical Boards or National Board of Medical Examiners. Information reported in this manuscript has not been previously presented at a conference. Data were collected from the artificial intelligence chatbot ChatGPT developed by OpenAI. As corresponding author, Rajeev H. Muni had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.Data availability statementThe data that support the findings of this study may be requested at andrew.mihalache@mail.utoronto.ca, with support from the principal investigator RHM.Additional informationFundingMMP: Financial support (to institution) – PSI Foundation, Fighting Blindness Canada. RHM: Consultant – Alcon, Apellis, AbbVie, Bayer, Bausch Health, Roche; Financial Support (to institution) – Alcon, AbbVie, Bayer, Novartis, Roche. Notes on contributorsAndrew MihalacheAndrew Mihalache is a MD candidate at the University of Toronto in Toronto, Ontario under the Temerty Faculty of Medicine.Ryan S. HuangRyan S. Huang is a MD candidate at the University of Toronto in Toronto, Ontario, under the Temerty Faculty of Medicine.Marko M. PopovicMarko M. Popovic is the Chief Ophthalmology Resident in the Department of Ophthalmology and Vision Sciences at the University of Toronto and has completed a Master of Public Health at the Harvard T.H. Chan School of Public Health.Rajeev H. MuniRajeev H. Muni is a staff vitreoretinal surgeon at St. Michael’s Hospital in Toronto, Ontario, Associate Professor and Vice-Chair of Clinical Research in the Department of Ophthalmology and Vision Sciences at the University of Toronto.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
此时正舞完成签到,获得积分10
1秒前
秋雪瑶应助星星采纳,获得10
2秒前
Lillian发布了新的文献求助10
6秒前
赘婿应助lion_wei采纳,获得10
6秒前
7秒前
可爱的函函应助ym采纳,获得30
7秒前
alphahe125完成签到,获得积分10
7秒前
qiandi发布了新的文献求助20
7秒前
自觉的芷蝶完成签到,获得积分10
8秒前
香蕉觅云应助Lipper采纳,获得10
13秒前
13秒前
优秀的灵安完成签到,获得积分10
14秒前
gjww应助科研通管家采纳,获得10
17秒前
上官若男应助科研通管家采纳,获得10
17秒前
搜集达人应助科研通管家采纳,获得10
17秒前
科研通AI2S应助科研通管家采纳,获得10
17秒前
isjj完成签到 ,获得积分10
17秒前
19秒前
20秒前
20秒前
七月流火应助luca采纳,获得200
21秒前
ym完成签到,获得积分10
22秒前
东方傲儿发布了新的文献求助10
23秒前
25秒前
overlood发布了新的文献求助10
25秒前
26秒前
asasd发布了新的文献求助10
27秒前
xiaisxi发布了新的文献求助10
29秒前
30秒前
32秒前
Nicole发布了新的文献求助10
32秒前
hym111完成签到,获得积分20
32秒前
maox1aoxin应助超超采纳,获得10
33秒前
34秒前
才才完成签到 ,获得积分10
34秒前
34秒前
cctv18应助overlood采纳,获得30
34秒前
by发布了新的文献求助20
38秒前
槑槑完成签到,获得积分10
39秒前
高分求助中
Manual of Clinical Microbiology, 4 Volume Set (ASM Books) 13th Edition 1000
Cross-Cultural Psychology: Critical Thinking and Contemporary Applications (8th edition) 800
Counseling With Immigrants, Refugees, and Their Families From Social Justice Perspectives pages 800
マンネンタケ科植物由来メロテルペノイド類の網羅的全合成/Collective Synthesis of Meroterpenoids Derived from Ganoderma Family 500
岩石破裂过程的数值模拟研究 500
Electrochemistry 500
Broflanilide prolongs the development of fall armyworm Spodoptera frugiperda by regulating biosynthesis of juvenile hormone 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 有机化学 工程类 生物化学 纳米技术 物理 内科学 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 电极 光电子学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 2373639
求助须知:如何正确求助?哪些是违规求助? 2081148
关于积分的说明 5214408
捐赠科研通 1808687
什么是DOI,文献DOI怎么找? 902752
版权声明 558343
科研通“疑难数据库(出版商)”最低求助积分说明 481998