Using Pretrained Large Language Models for AI-Driven Assessment in Medical Education

医学教育 计算机科学 自然语言处理 人工智能 医学
作者
Jacob Cole,Joshua Duncan,Rebekah Cole
出处
期刊:Academic Medicine [Lippincott Williams & Wilkins]
卷期号:100 (12): 1442-1446 被引量:2
标识
DOI:10.1097/acm.0000000000006207
摘要

PROBLEM: Assessing students in competency-based medical education can be time-consuming and demanding for faculty, especially with large classes and complex topics. Traditional methods can lead to inconsistencies and a lack of targeted feedback. Innovative and accessible solutions to improve the efficiency, objectivity, and effectiveness of assessment in medical education are needed. APPROACH: From September 2024 to February 2025, the authors piloted the use of large language models (LLMs) with retrieval-augmented generation to assess students' understanding of moral injury. The authors selected and uploaded 6 seminal articles on moral injury within military and veteran populations to Google Gemini 1.5 Pro. They tasked the same LLM with creating a grading rubric based on these articles to assess 165 student responses in a military medical ethics course (Uniformed Services University of the Health Sciences). The authors uploaded both the generated rubric and the student responses to each of 3 LLMs (Google Gemini 1.5 Pro, Google Gemini 2.0 Flash, and OpenAI ChatGPT-4o) with a prompt to generate scores for the student responses. OUTCOMES: In the authors' expert opinion, an LLM (Google Gemini 1.5 Pro) successfully generated a grading rubric that captured the nuances of moral injury and its implications for military medical practice. The LLMs' scoring accuracy was compared against 2 experienced educators to generate validity evidence. The best-performing model, OpenAI ChatGPT-4o, demonstrated an interrater reliability of 0.77 and 0.68 for reviewers 1 and 2, respectively, indicating a higher level of agreement between the LLM and both individual reviewers than between the 2 reviewers (0.57). NEXT STEPS: While this approach shows promise, faculty oversight is necessary to ensure ethical accountability and address potential biases. Further research is needed to optimize the integration of AI and human capabilities in assessment to ultimately enhance the quality of health care professional education and improve patient outcomes.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
zhongyanfen发布了新的文献求助10
1秒前
2秒前
3秒前
喻修杰完成签到,获得积分10
3秒前
跳跃惜筠发布了新的文献求助10
3秒前
开心怀蕊完成签到,获得积分10
3秒前
大轩发布了新的文献求助10
4秒前
4秒前
5秒前
lzy完成签到,获得积分10
6秒前
爆米花应助mhpvv采纳,获得10
7秒前
7秒前
冤申发布了新的文献求助100
8秒前
科研通AI6.2应助lzy采纳,获得10
9秒前
珂珂子发布了新的文献求助30
9秒前
称心的筝完成签到,获得积分10
10秒前
Hello应助shun采纳,获得10
11秒前
11秒前
llly发布了新的文献求助10
12秒前
会撒娇的续完成签到,获得积分10
12秒前
小马甲应助Liumingyu采纳,获得10
12秒前
14秒前
Xieyusen完成签到,获得积分10
15秒前
秋子发布了新的文献求助10
18秒前
啦啦啦啦发布了新的文献求助10
18秒前
科目三应助橘生淮南.采纳,获得10
19秒前
19秒前
22秒前
siauguo完成签到,获得积分10
23秒前
shenlee完成签到,获得积分10
24秒前
馨xin完成签到 ,获得积分10
24秒前
24秒前
25秒前
25秒前
可爱的函函应助秋子采纳,获得10
25秒前
珂珂子完成签到,获得积分10
26秒前
fanchjiang发布了新的文献求助10
26秒前
金城武完成签到,获得积分10
26秒前
123发布了新的文献求助10
26秒前
lay完成签到,获得积分10
27秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Introduction to Helicopter and Tiltrotor Flight Simulation, Second Edition 2500
卤化钙钛矿人工突触的研究 2000
History of U.S. Space Surveillance and Satellite Cataloging 1000
Malcolm Fraser : a biography 700
Signals, Systems, and Signal Processing 610
Materials selection in mechanical design 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6506703
求助须知:如何正确求助?哪些是违规求助? 8300358
关于积分的说明 17718903
捐赠科研通 5607246
什么是DOI,文献DOI怎么找? 2920902
邀请新用户注册赠送积分活动 1898017
关于科研通互助平台的介绍 1760469