Performance of ChatGPT on Ophthalmology-Related Questions Across Various Examination Levels: Observational Study

医学 美国医学执照考试 眼科 执照 医学院 医学教育 家庭医学
作者
Firas Haddad,Joanna S Saade
出处
期刊:JMIR medical education [JMIR Publications Inc.]
卷期号:10: e50842-e50842 被引量:2
标识
DOI:10.2196/50842
摘要

ChatGPT and language learning models have gained attention recently for their ability to answer questions on various examinations across various disciplines. The question of whether ChatGPT could be used to aid in medical education is yet to be answered, particularly in the field of ophthalmology.The aim of this study is to assess the ability of ChatGPT-3.5 (GPT-3.5) and ChatGPT-4.0 (GPT-4.0) to answer ophthalmology-related questions across different levels of ophthalmology training.Questions from the United States Medical Licensing Examination (USMLE) steps 1 (n=44), 2 (n=60), and 3 (n=28) were extracted from AMBOSS, and 248 questions (64 easy, 122 medium, and 62 difficult questions) were extracted from the book, Ophthalmology Board Review Q&A, for the Ophthalmic Knowledge Assessment Program and the Board of Ophthalmology (OB) Written Qualifying Examination (WQE). Questions were prompted identically and inputted to GPT-3.5 and GPT-4.0.GPT-3.5 achieved a total of 55% (n=210) of correct answers, while GPT-4.0 achieved a total of 70% (n=270) of correct answers. GPT-3.5 answered 75% (n=33) of questions correctly in USMLE step 1, 73.33% (n=44) in USMLE step 2, 60.71% (n=17) in USMLE step 3, and 46.77% (n=116) in the OB-WQE. GPT-4.0 answered 70.45% (n=31) of questions correctly in USMLE step 1, 90.32% (n=56) in USMLE step 2, 96.43% (n=27) in USMLE step 3, and 62.90% (n=156) in the OB-WQE. GPT-3.5 performed poorer as examination levels advanced (P<.001), while GPT-4.0 performed better on USMLE steps 2 and 3 and worse on USMLE step 1 and the OB-WQE (P<.001). The coefficient of correlation (r) between ChatGPT answering correctly and human users answering correctly was 0.21 (P=.01) for GPT-3.5 as compared to -0.31 (P<.001) for GPT-4.0. GPT-3.5 performed similarly across difficulty levels, while GPT-4.0 performed more poorly with an increase in the difficulty level. Both GPT models performed significantly better on certain topics than on others.ChatGPT is far from being considered a part of mainstream medical education. Future models with higher accuracy are needed for the platform to be effective in medical education.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
1秒前
陈泽宇发布了新的文献求助10
1秒前
1秒前
1秒前
contador完成签到,获得积分10
1秒前
Sia发布了新的文献求助10
2秒前
樱桃发布了新的文献求助10
3秒前
4秒前
weishao完成签到,获得积分10
5秒前
Una完成签到,获得积分10
5秒前
香蕉觅云应助个性竺采纳,获得50
6秒前
难过的一一完成签到,获得积分10
6秒前
7秒前
ured163com发布了新的文献求助10
8秒前
小小王发布了新的文献求助10
11秒前
11秒前
12秒前
12秒前
聆听发布了新的文献求助10
12秒前
你以为你是谁完成签到,获得积分10
13秒前
13秒前
顺心乌龟完成签到,获得积分10
13秒前
十一完成签到,获得积分10
14秒前
wanci发布了新的文献求助10
14秒前
CodeCraft应助樱桃采纳,获得10
16秒前
16秒前
QQQ关注了科研通微信公众号
16秒前
18秒前
ocean发布了新的文献求助10
18秒前
19秒前
cctv18应助yinghuazhige08采纳,获得10
20秒前
谷贝贝完成签到,获得积分20
20秒前
小小肖发布了新的文献求助10
21秒前
vino完成签到,获得积分10
22秒前
CodeCraft应助冉亦采纳,获得10
22秒前
小凡发布了新的文献求助10
23秒前
23秒前
打打应助青衫采纳,获得10
24秒前
25秒前
25秒前
高分求助中
Thermodynamic data for steelmaking 3000
Counseling With Immigrants, Refugees, and Their Families From Social Justice Perspectives pages 800
Philostratus Heroicus. Gymnasticus. Discourses 1 and 2 (Hardback) 530
Electrochemistry 500
Statistical Procedures for the Medical Device Industry 400
藍からはじまる蛍光性トリプタンスリン研究 400
Cardiology: Board and Certification Review 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 有机化学 工程类 生物化学 纳米技术 物理 内科学 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 电极 光电子学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 2366322
求助须知:如何正确求助?哪些是违规求助? 2075349
关于积分的说明 5190675
捐赠科研通 1802550
什么是DOI,文献DOI怎么找? 900066
版权声明 557955
科研通“疑难数据库(出版商)”最低求助积分说明 480361