Comparison of Large Language Models’ Performance on 600 Nuclear Medicine Technology Board Examination–Style Questions

风格(视觉艺术) 计算机科学 历史 考古
作者
Michael A Oumano,Shawn M Pickett
出处
期刊:Journal of Nuclear Medicine Technology [Society of Nuclear Medicine]
卷期号:: jnmt.124.269335-jnmt.124.269335
标识
DOI:10.2967/jnmt.124.269335
摘要

This study investigated the application of large language models (LLMs) with and without retrieval-augmented generation (RAG) in nuclear medicine, particularly their performance across various topics relevant to the field, to evaluate their potential use as reliable tools for professional education and clinical decision-making. Methods: We evaluated the performance of LLMs, including the OpenAI GPT-4o series, Google Gemini, Cohere, Anthropic, and Meta Llama3, across 15 nuclear medicine topics. The models' accuracy was assessed using a set of 600 sample questions, covering a range of clinical and technical domains in nuclear medicine. Overall accuracy was measured by averaging performance across these topics. Additional performance comparisons were conducted across individual models. Results: OpenAI's models, particularly openai_nvidia_gpt-4o_final and openai_mxbai_gpt-4o_final, demonstrated the highest overall accuracy, achieving scores of 0.787 and 0.783, respectively, when RAG was implemented. Anthropic Opus and Google Gemini 1.5 Pro followed closely, with competitive overall accuracy scores of 0.773 and 0.750 with RAG. Cohere and Llama3 models showed more variability in performance, with the Llama3 ollama_llama3 model (without RAG) achieving the lowest accuracy. Discrepancies were noted in question interpretation, particularly in complex clinical guidelines and imaging-based queries. Conclusion: LLMs show promising potential in nuclear medicine, improving diagnostic accuracy, especially in areas like radiation safety and skeletal system scintigraphy. This study also demonstrates that adding a RAG workflow can increase the accuracy of an off-the-shelf model. However, challenges persist in handling nuanced guidelines and visual data, emphasizing the need for further optimization in LLMs for medical applications.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
MiyaGuo发布了新的文献求助10
刚刚
zigzag发布了新的文献求助10
刚刚
刚刚
aaaaaa完成签到,获得积分10
刚刚
踏实威完成签到,获得积分10
刚刚
1秒前
1秒前
qq完成签到,获得积分10
1秒前
爱吃冰糖葫芦完成签到,获得积分10
1秒前
我是唐不是傻完成签到,获得积分10
1秒前
爱听歌绿竹完成签到 ,获得积分10
2秒前
suqihe完成签到,获得积分10
2秒前
今后应助现代的曲奇采纳,获得10
2秒前
vip666完成签到 ,获得积分10
3秒前
3秒前
dkclz完成签到 ,获得积分10
3秒前
华仔应助糕糕采纳,获得10
3秒前
3秒前
李JJ完成签到,获得积分10
3秒前
3秒前
3秒前
ember6完成签到,获得积分10
4秒前
liuyunhao7207完成签到,获得积分10
4秒前
Anhan应助maun222采纳,获得10
4秒前
4秒前
向钱看发布了新的文献求助10
4秒前
baihehuakai发布了新的文献求助10
4秒前
乐观小之发布了新的文献求助10
4秒前
5秒前
林小乌龟完成签到,获得积分10
5秒前
思源应助笨笨的鬼神采纳,获得10
5秒前
852应助Manxi采纳,获得10
5秒前
小曹君完成签到,获得积分10
5秒前
谦让的半山完成签到 ,获得积分10
6秒前
6秒前
hyw发布了新的文献求助10
6秒前
胡子西瓜发布了新的文献求助10
6秒前
隐形曼青应助现代的南风采纳,获得10
6秒前
6秒前
神勇青枫发布了新的文献求助10
7秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Kinesiophobia : a new view of chronic pain behavior 5000
Molecular Biology of Cancer: Mechanisms, Targets, and Therapeutics 3000
Feldspar inclusion dating of ceramics and burnt stones 1000
What is the Future of Psychotherapy in a Digital Age? 801
The Psychological Quest for Meaning 800
Digital and Social Media Marketing 600
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 计算机科学 有机化学 物理 生物化学 纳米技术 复合材料 内科学 化学工程 人工智能 催化作用 遗传学 数学 基因 量子力学 物理化学
热门帖子
关注 科研通微信公众号,转发送积分 5981277
求助须知:如何正确求助?哪些是违规求助? 7370944
关于积分的说明 16023350
捐赠科研通 5121375
什么是DOI,文献DOI怎么找? 2748564
邀请新用户注册赠送积分活动 1718296
关于科研通互助平台的介绍 1625211