作者
Ryan T. Halvorson,Timothy Keeley,Kian Niknam,Travis Zack,S. Majumdar,B. Feeley,Alan L Zhang,Drew A. Lansdown
摘要
PURPOSE: To validate the performance of a pretrained large language model (LLM) in predicting orthopaedic surgeon recommendations for management of newly referred patients, using free-text previsit questionnaire responses as input. METHODS: This retrospective cross-sectional study included new patients visiting an orthopaedic sports medicine clinic between 2020 and 2023. Using zero-shot prompting, the LLM analyzed previsit questionnaire responses (e.g., "When did you start to have pain?") to predict whether patients required advanced imaging and/or surgical intervention. The LLM was blinded to all other clinical information, including surgeon notes, physical exams, or referral data. Model predictions were evaluated with accuracy, sensitivity, and specificity in comparison to actual surgeon-generated plans. For a subset of patients who had undergone advanced imaging, the LLM was augmented with free-text radiology reports and asked to provide updated surgical recommendations. RESULTS: In the combined cohort of 1141 patients, the LLM predicted surgeon recommendation for advanced imaging with 70% accuracy, 83% sensitivity, and 64% specificity using previsit questionnaire responses alone. Imaging predictions were accurate for common diagnoses, including anterior cruciate ligament (ACL, 94%), meniscus (85%), and rotator cuff (80%) injuries but poor for knee (54%) and shoulder arthritis (66%). When augmented with imaging reports, the LLM predicted recommendations for surgery with 81% accuracy, 88% sensitivity, and 72% specificity. Surgical predictions were highly accurate for ACL (93%), meniscus (78%), rotator cuff (83%), and shoulder instability related pathologies (78%). CONCLUSIONS: Using previsit questionnaire data from new orthopaedic patients with knee and shoulder complaints, the pretrained LLM showed 70% accuracy for imaging recommendations, and the augmented surgical-decision LLM showed 81% accuracy for surgical recommendations. LEVEL OF EVIDENCE: Level III, retrospective diagnostic case-control study.