召回
计算机科学
混淆矩阵
集合(抽象数据类型)
混乱
心理干预
社会化媒体
人工智能
自然语言处理
机器学习
应用心理学
情报检索
医学
心理学
万维网
护理部
精神分析
认知心理学
程序设计语言
作者
Zifan Gu,Li He,Ayesha Naeem,Pui Man Chan,A. A. Mohamed,Heba M. A. Khalil,Yujia Guo,Jingwei Huang,Ismael Villanueva-Miranda,Ying Ding,Wenqi Shi,Matthew E. Dupre,Guanghua Xiao,Eric D. Peterson,Yang Xie,Ann Marie Návar,Donghan M. Yang
标识
DOI:10.1093/jamia/ocaf124
摘要
Abstract Objective Social and behavioral determinants of health (SBDH) are increasingly recognized as essential for prognostication and informing targeted interventions. Clinical notes often contain details about SBDH in unstructured format. Conventional extraction methods for these data tend to be labor intensive, inaccurate, and/or unscalable. In this study, we aim to develop and validate a large language model (LLM)-powered method to extract structured SBDH data from clinical notes through prompt engineering. Materials and Methods We developed SBDH-Reader to extract 6 categories of granular SBDH data by prompting GPT-4o, including employment, housing, marital status, and substance use including alcohol, tobacco, and drug use. SBDH-Reader was developed using 7225 notes from 6382 patients in the MIMIC-III database (2001–2012) and externally validated using 971 notes from 437 patients at The University of Texas Southwestern Medical Center (UTSW; 2022–2023). We evaluated SBDH-Reader’s performance against human-annotated ground truths based on precision, recall, F1, and confusion matrix. Results When tested on the UTSW validation set, SBDH-Reader achieved a macro-average F1 ranging from 0.94 to 0.98 across 6 SBDH categories. For clinically relevant adverse attributes, F1 ranged from 0.96 (employment; housing) to 0.99 (tobacco use). When extracting any adverse attributes across all SBDH categories, SBDH-Reader achieved an F1 of 0.97, recall of 0.97, and precision of 0.98 in the independent validation set. Discussion SBDH-Reader demonstrated strong performance in extracting structured SBDH data through effective prompt engineering of a general-purpose LLM, without the need for task-specific fine-tuning. Its modular design and adaptability to diverse datasets and documentation patterns support its applicability in real-world clinical settings. Conclusion SBDH-Reader has the potential to serve as a scalable and effective method for collecting real-time, patient-level SBDH data to support clinical research and care.
科研通智能强力驱动
Strongly Powered by AbleSci AI