ChatGPT-4: An assessment of an upgraded artificial intelligence chatbot in the United States Medical Licensing Examination

执照美国医学执照考试医学教育多项选择医学考试（生物学）家庭医学心理学医学院显著性差异内科学古生物学生物

作者

Andrew Mihalache,Ryan S. Huang,Marko Popović,Rajeev H. Muni

出处

期刊：Medical Teacher [Informa]
日期：2023-10-15 卷期号：: 1-7 被引量：3

链接

nih.govdoi.org

标识

DOI：10.1080/0142159x.2023.2249588

摘要

AbstractPurpose: ChatGPT-4 is an upgraded version of an artificial intelligence chatbot. The performance of ChatGPT-4 on the United States Medical Licensing Examination (USMLE) has not been independently characterized. We aimed to assess the performance of ChatGPT-4 at responding to USMLE Step 1, Step 2CK, and Step 3 practice questions.Method: Practice multiple-choice questions for the USMLE Step 1, Step 2CK, and Step 3 were compiled. Of 376 available questions, 319 (85%) were analyzed by ChatGPT-4 on March 21st, 2023. Our primary outcome was the performance of ChatGPT-4 for the practice USMLE Step 1, Step 2CK, and Step 3 examinations, measured as the proportion of multiple-choice questions answered correctly. Our secondary outcomes were the mean length of questions and responses provided by ChatGPT-4.Results: ChatGPT-4 responded to 319 text-based multiple-choice questions from USMLE practice test material. ChatGPT-4 answered 82 of 93 (88%) questions correctly on USMLE Step 1, 91 of 106 (86%) on Step 2CK, and 108 of 120 (90%) on Step 3. ChatGPT-4 provided explanations for all questions. ChatGPT-4 spent 30.8 ± 11.8 s on average responding to practice questions for USMLE Step 1, 23.0 ± 9.4 s per question for Step 2CK, and 23.1 ± 8.3 s per question for Step 3. The mean length of practice USMLE multiple-choice questions that were answered correctly and incorrectly by ChatGPT-4 was similar (difference = 17.48 characters, SE = 59.75, 95%CI = [-100.09,135.04], t = 0.29, p = 0.77). The mean length of ChatGPT-4’s correct responses to practice questions was significantly shorter than the mean length of incorrect responses (difference = 79.58 characters, SE = 35.42, 95%CI = [9.89,149.28], t = 2.25, p = 0.03).Conclusions: ChatGPT-4 answered a remarkably high proportion of practice questions correctly for USMLE examinations. ChatGPT-4 performed substantially better at USMLE practice questions than previous models of the same AI chatbot.Keywords: artificial intelligencenatural language processingUnited States medical licensing examinationchatgpt-4 Disclosure statementThe views expressed herein are those of the authors and do not necessarily reflect the position of the Federation of State Medical Boards or National Board of Medical Examiners. Information reported in this manuscript has not been previously presented at a conference. Data were collected from the artificial intelligence chatbot ChatGPT developed by OpenAI. As corresponding author, Rajeev H. Muni had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.Data availability statementThe data that support the findings of this study may be requested at andrew.mihalache@mail.utoronto.ca, with support from the principal investigator RHM.Additional informationFundingMMP: Financial support (to institution) – PSI Foundation, Fighting Blindness Canada. RHM: Consultant – Alcon, Apellis, AbbVie, Bayer, Bausch Health, Roche; Financial Support (to institution) – Alcon, AbbVie, Bayer, Novartis, Roche. Notes on contributorsAndrew MihalacheAndrew Mihalache is a MD candidate at the University of Toronto in Toronto, Ontario under the Temerty Faculty of Medicine.Ryan S. HuangRyan S. Huang is a MD candidate at the University of Toronto in Toronto, Ontario, under the Temerty Faculty of Medicine.Marko M. PopovicMarko M. Popovic is the Chief Ophthalmology Resident in the Department of Ophthalmology and Vision Sciences at the University of Toronto and has completed a Master of Public Health at the Harvard T.H. Chan School of Public Health.Rajeev H. MuniRajeev H. Muni is a staff vitreoretinal surgeon at St. Michael’s Hospital in Toronto, Ontario, Associate Professor and Vice-Chair of Clinical Research in the Department of Ophthalmology and Vision Sciences at the University of Toronto.

求助该文献

最长约 10秒，即可获得该文献文件

ChatGPT-4: An assessment of an upgraded artificial intelligence chatbot in the United States Medical Licensing Examination

今日热心研友