The performance of ChatGPT on orthopaedic in-service training exams: A comparative study of the GPT-3.5 turbo and GPT-4 models in orthopaedic education

医学 涡轮 推论 兵役 内科学 人工智能 计算机科学 工程类 历史 考古 汽车工程
作者
Michael G. Rizzo,Na Cai,David S. Constantinescu
出处
期刊:Journal of orthopaedics [Elsevier]
卷期号:50: 70-75 被引量:4
标识
DOI:10.1016/j.jor.2023.11.056
摘要

The rapid advancement of artificial intelligence (AI), particularly the development of Large Language Models (LLMs) such as Generative Pretrained Transformers (GPTs), has revolutionized numerous fields. The purpose of this study is to investigate the application of LLMs within the realm of orthopaedic in training examinations. Questions from the 2020–2022 Orthopaedic In-Service Training Exams (OITEs) were given to OpenAI's GPT-3.5 Turbo and GPT-4 LLMs, using a zero-shot inference approach. Each model was given a multiple-choice question, without prior exposure to similar queries, and their generated responses were compared to the correct answer within each OITE. The models were evaluated on overall accuracy, performance on questions with and without media, and performance on first- and higher-order questions. The GPT-4 model outperformed the GPT-3.5 Turbo model across all years and question categories (2022: 67.63% vs. 50.24%; 2021: 58.69% vs. 47.42%; 2020: 59.53% vs. 46.51%). Both models showcased better performance with questions devoid of associated media, with GPT-4 attaining accuracies of 68.80%, 65.14%, and 68.22% for 2022, 2021, and 2020, respectively. GPT-4 outscored GPT-3.5 Turbo on first-order questions across all years (2022: 63.83% vs. 38.30%; 2021: 57.45% vs. 50.00%; 2020: 65.74% vs. 53.70%). GPT-4 also outscored GPT-3.5 Turbo on higher-order questions across all years (2022: 68.75% vs. 53.75%; 2021: 59.66% vs. 45.38%; 2020: 53.27% vs. 39.25%). GPT-4 showed improved performance compared to GPT-3.5 Turbo in all tested categories. The results reflect the potential and limitations of AI in orthopaedics. GPT-4's performance is comparable to a second-to-third-year resident and GPT-3.5 Turbo's performance is comparable to a first-year resident, suggesting the application of current LLMs can neither pass the OITE nor substitute orthopaedic training. This study sets a precedent for future endeavors integrating GPT models into orthopaedic education and underlines the necessity for specialized training of these models for specific medical domains.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
llq发布了新的文献求助10
刚刚
斯文冰旋发布了新的文献求助10
刚刚
忧郁绣连完成签到,获得积分10
2秒前
2秒前
脑洞疼应助susan采纳,获得10
2秒前
师大还可以完成签到 ,获得积分10
2秒前
3秒前
FashionBoy应助Linda采纳,获得30
4秒前
默默的寒松完成签到,获得积分10
6秒前
为什么我不会完成签到,获得积分10
6秒前
定格完成签到 ,获得积分10
7秒前
8秒前
9秒前
大气依萱发布了新的文献求助10
9秒前
王走走发布了新的文献求助10
9秒前
自信枫叶完成签到,获得积分10
10秒前
xfq0829完成签到 ,获得积分10
11秒前
於依白完成签到,获得积分10
12秒前
研友_VZG7GZ应助SYMI采纳,获得20
12秒前
Lily发布了新的文献求助10
12秒前
14秒前
scc发布了新的文献求助10
14秒前
自信枫叶发布了新的文献求助10
14秒前
田様应助可燃冰采纳,获得10
15秒前
16秒前
认真冬日完成签到,获得积分10
16秒前
16秒前
16秒前
怕孤单的书包完成签到,获得积分10
17秒前
linqin应助DLDL采纳,获得10
19秒前
秋雪瑶应助默默的寒松采纳,获得10
19秒前
Atoxus完成签到,获得积分10
19秒前
情怀应助13508104971采纳,获得30
19秒前
金鱼发布了新的文献求助10
21秒前
Atoxus发布了新的文献求助10
21秒前
22秒前
23秒前
川zzq关注了科研通微信公众号
25秒前
金鱼完成签到,获得积分10
26秒前
zhao完成签到 ,获得积分10
26秒前
高分求助中
Thermodynamic data for steelmaking 3000
Manual of Clinical Microbiology, 4 Volume Set (ASM Books) 13th Edition 1000
Cross-Cultural Psychology: Critical Thinking and Contemporary Applications (8th edition) 800
Counseling With Immigrants, Refugees, and Their Families From Social Justice Perspectives pages 800
Electrochemistry 500
Statistical Procedures for the Medical Device Industry 400
藍からはじまる蛍光性トリプタンスリン研究 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 有机化学 工程类 生物化学 纳米技术 物理 内科学 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 电极 光电子学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 2369417
求助须知:如何正确求助?哪些是违规求助? 2078260
关于积分的说明 5202029
捐赠科研通 1805570
什么是DOI,文献DOI怎么找? 901211
版权声明 558111
科研通“疑难数据库(出版商)”最低求助积分说明 481017