Large Language Model Clinical Vignettes and Multiple-Choice Questions for Postgraduate Medical Education

医学教育 高等教育 梅德林 心理学 医学 政治学 法学
作者
Frank I. Jackson,Nathan Keller,Insaf Kouba,Wassil Kouba,Luis A. Bracero,Matthew J. Blitz
出处
期刊:Academic Medicine [Lippincott Williams & Wilkins]
卷期号:100 (10): 1163-1166 被引量:6
标识
DOI:10.1097/acm.0000000000006137
摘要

PROBLEM: Clinical vignette-based multiple-choice questions (MCQs) have been used to assess postgraduate medical trainees but require substantial time and effort to develop. Large language models, a type of artificial intelligence (AI), can potentially expedite this task. This report describes prompt engineering techniques used with ChatGPT-4 to generate clinical vignettes and MCQs for obstetrics-gynecology residents and evaluates whether residents and attending physicians can differentiate between human- and AI-generated content. APPROACH: The authors generated MCQs using a structured prompt engineering approach, incorporating authoritative source documents and an iterative prompt chaining technique, to refine output quality. Fifty human-generated and 50 AI-generated MCQs were randomly arranged into 10 quizzes (10 questions each). The AI-generated MCQs were developed in August 2024 and surveys conducted in September 2024. Obstetrics-gynecology residents and attending physician faculty members at Northwell Health or Donald and Barbara Zucker School of Medicine at Hofstra/Northwell completed an online survey, answering each MCQ and indicating whether they believed it was human or AI written or if they were uncertain. OUTCOMES: Thirty-three participants (16 residents, 17 attendings) completed the survey (80.5% response rate). Respondents correctly identified MCQ authorship a median (interquartile range [IQR]) of 39.1% (30.0%-50.0%) of the time, indicating difficulty in distinguishing human- and AI-generated questions. The median (IQR) correct answer selection rate was 62.3% (50.0%-75.0%) for human-generated MCQs and 64.4% (50.0%-83.3%) for AI-generated MCQs ( P = .74). The difficulty (0.69 vs 0.66, P = .83) and discriminatory (0.42 vs 0.38, P = .90) indexes showed no significant differences, supporting the feasibility of large language model-generated MCQs in medical education. NEXT STEPS: Future studies should explore the optimal balance between AI-generated content and expert review, identifying strategies to maximize efficiency without compromising accuracy. The authors will develop practice exams and assess their predictive validity by comparing scores with standardized exam results.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
椰肉完成签到 ,获得积分10
1秒前
小苏发布了新的文献求助10
3秒前
6秒前
我是老大应助小小采纳,获得10
7秒前
ww关注了科研通微信公众号
9秒前
科研通AI6.2应助zr采纳,获得10
10秒前
10秒前
11秒前
英姑应助科研通管家采纳,获得10
11秒前
11秒前
无极微光应助科研通管家采纳,获得20
11秒前
今后应助爱笑girl采纳,获得10
11秒前
小杨完成签到,获得积分10
11秒前
senli2018发布了新的文献求助10
12秒前
费谷槐完成签到 ,获得积分10
12秒前
Ava应助小苏采纳,获得10
12秒前
研友_VZG7GZ应助Dylan采纳,获得10
15秒前
靓丽的采白完成签到,获得积分10
15秒前
duwenzhao2026发布了新的文献求助10
16秒前
17秒前
19秒前
孔蓓蓓发布了新的文献求助10
22秒前
凉宫八月完成签到,获得积分10
22秒前
27秒前
27秒前
31秒前
zr完成签到,获得积分10
31秒前
斯文败类应助犹豫大树采纳,获得10
32秒前
zhiyu完成签到,获得积分10
34秒前
zr发布了新的文献求助10
35秒前
无极微光应助罗蒙洛索夫采纳,获得20
40秒前
FashionBoy应助悦己采纳,获得10
40秒前
40秒前
42秒前
48秒前
cdercder应助不氪采纳,获得10
48秒前
Rui完成签到,获得积分10
49秒前
无花果应助梦红采纳,获得10
58秒前
Jasper应助灰灰采纳,获得10
59秒前
小二郎应助灰灰采纳,获得10
59秒前
高分求助中
Invited Discussant 63O and 64O 1000
Ideology and Meaning-Making under the Putin Regime 750
Petrology and Plate Tectonics 500
Writing Systems 500
A Handbook of User Experience Research & Design in Libraries 400
Understanding Modeling and Simulation of Polymerization Reactions 400
Direct and Iterative Linear System Solvers 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 计算机科学 化学工程 生物化学 物理 内科学 复合材料 催化作用 光电子学 物理化学 电极 细胞生物学 基因 遗传学
热门帖子
关注 科研通微信公众号,转发送积分 6902994
求助须知:如何正确求助?哪些是违规求助? 8597228
关于积分的说明 18251548
捐赠科研通 6304815
什么是DOI,文献DOI怎么找? 3063061
关于科研通互助平台的介绍 2084822
邀请新用户注册赠送积分活动 2040919