认证
放射性武器
计算机科学
背景(考古学)
人工智能
人类健康
集合(抽象数据类型)
机器学习
医学物理学
医学
外科
生物
程序设计语言
古生物学
环境卫生
政治学
法学
作者
Grace Roemer,A Li,Usman Mahmood,Lawrence T. Dauer,Michael Bellamy
标识
DOI:10.1088/1361-6498/ad1fdf
摘要
Abstract This study assesses the efficacy of Generative Pre-Trained Transformers (GPT) published by OpenAI in the specialized domains of radiological protection and health physics. Utilizing a set of 1064 surrogate questions designed to mimic a health physics certification exam, we evaluated the models' ability to accurately respond to questions across five knowledge domains. Our results indicated that neither model met the 67% passing threshold, with GPT-3.5 achieving a 45.3% weighted average and GPT-4 attaining 61.7%. Despite GPT-4's significant parameter increase and multimodal capabilities, it demonstrated superior performance in all categories yet still fell short of a passing score. The study's methodology involved a simple, standardized prompting strategy without employing prompt engineering or in-context learning, which are known to potentially enhance performance. The analysis revealed that GPT-3.5 formatted answers more correctly, despite GPT-4's higher overall accuracy. The findings suggest that while GPT-3.5 and GPT-4 show promise in handling domain-specific content, their application in the field of radiological protection should be approached with caution, emphasizing the need for human oversight and verification.
科研通智能强力驱动
Strongly Powered by AbleSci AI