作者
Theodore R. Pak,Sanjat Kanjilal,Caroline McKenna,Alexander Hoffner-Heinike,Chanu Rhee,Michael Klompas
摘要
Importance Presenting signs and symptoms affect the care of patients with possible sepsis. However, signs and symptoms are not incorporated into most large observational studies because they are difficult to extract from clinical notes at scale. Objective To assess the use of large language models (LLMs) to extract presenting signs and symptoms from admission notes and characterize their associations with infectious diagnoses, multidrug-resistant infections, and mortality. Design, Setting, and Participants This retrospective cohort study obtained data from 5 Massachusetts hospitals within 1 health care system between June 1, 2015, and August 1, 2022. Participants were hospitalized adult patients with possible infection (determined by blood culture drawn and intravenous antibiotics administered within 24 hours of arrival). An LLM (LLaMA 3 8B; Meta) was used to extract up to 10 presenting signs and symptoms from each patient’s history-and-physical admission notes. LLM-generated labels were validated by blinded review of 303 random admission notes. Data analyses were performed from July 2023 to August 2025. Exposures Thirty most common signs and symptoms were retained as exposures, and unsupervised clustering was used to create syndromes, which were compared with infection sources derived from the International Statistical Classification of Diseases, Tenth Revision, Clinical Modification discharge codes. Main Outcomes and Measures Outcomes included positive cultures for methicillin-resistant Staphylococcus aureus (MRSA), positive cultures for multidrug-resistant gram-negative (MDRGN) organisms, and in-hospital mortality. Multivariable logistic regression was used to adjust for demographics, comorbidities, physiologic markers of severity of illness, and time to antibiotics. Results Among the 104 248 patients (median [IQR] age, 66 [52-78] years; 54 137 males [51.9%]) included, 23 619 (22.7%) had sepsis without shock, 25 990 (24.9%) had septic shock, and 94 913 (91.0%) had 1 or more admission note within 24 hours. The LLM labeled the notes of 93 674 of 94 913 patients (98.7%). On manual validation, LLM labels had an accuracy of 99.3% (95% CI, 99.2%-99.3%), balanced accuracy of 84.6% (95% CI, 83.5%-85.8%), positive predictive value of 68.4% (95% CI, 66.0%-70.7%), sensitivity of 69.7% (95% CI, 67.3%-72.0%), and specificity of 99.6% (95% CI, 99.6%-99.6%) compared with the physician medical record reviewer. The 30 most common signs and symptoms were clustered into syndromes that correlated with infection sources. Presence of skin and soft tissue symptoms (adjusted odds ratio [AOR], 1.73; 95% CI, 1.49-2.00) and absence of gastrointestinal (AOR, 0.63; 95% CI, 0.54-0.73) or urinary tract symptoms (AOR, 0.34; 95% CI, 0.22-0.50) were associated with MRSA culture positivity; inverse associations were seen for MDRGN organisms. Cardiopulmonary symptoms were associated with increased mortality (AOR, 1.30; 95% CI, 1.17-1.45). Conclusions and Relevance This cohort study found that an LLM accurately extracted presenting signs and symptoms from admission notes that clustered into syndromes differentially correlated with infection sources, multidrug-resistant infections, and mortality. Further research is warranted to evaluate the value of large-scale sign-and-symptom data in models of antibiotic choice, effectiveness, and outcomes in patients with possible sepsis.