Abstract 134: Use of Large Language Model to Allow Reliable Data Acquisition for International Pediatric Stroke Study

医学冲程（发动机）小儿中风医学物理学缺血性中风内科学机械工程工程类缺血

作者

Kriti Bhayana,Dulin Wang,Xiaoqian Jiang,Stuart Fraser

出处

期刊：Stroke [Lippincott Williams & Wilkins]
日期：2025-01-30 卷期号：56 (Suppl_1)

标识

DOI：10.1161/str.56.suppl_1.134

摘要

Introduction: Pediatric stroke research is hindered by lack of funding and relative disease rarity. Shared data in pediatric stroke is done via non-reimbursed data input by clinical investigators at participating children’s hospitals with the International Pediatric Stroke Study (IPSS). Large Language Models (LLM) can potentially reduce investigator workload through automated data entry. In prior research, investigators were able to achieve 94% accuracy while using a prompt engineering approach with Generative Pretrained Transformer 4 (GPT4) to enter subject outcome forms of the IPSS using clinical notes. However, GPT4 performed only moderately (~50% correct) while attempting to answer some of the data questions. In this study we aim to utilize another toolkit called the “Instructor” to improve the performance of the LLM in areas where the prior method achieved less than 90% accuracy. Methods: This retrospective study used de-identified clinical notes of 50 patients who presented to UTHealth Pediatric Stroke Clinic between January 2020 and July 2023 with ischemic stroke. Each note was run through the offline HIPAA compliant LLM “GPT4o” to answer questions in the outcome form of IPSS. We focused on areas of the IPSS outcome form where prior approach yielded less than 90% accuracy. We implemented the "Instructor", a Python library built on Pydantic, to enhance prompt engineering and ensure structured outputs. Accuracy was measured as percent agreement between the LLM generated and investigator-entered data. We used simple descriptive statistics to compare the accuracy (% correct) of Instructor method with clinical investigator-entered data and previously reported results from traditional prompt engineering method. Results: We analyzed neurological deficit severity and post discharge rehabilitation questions. This algorithm reported 100% accuracy for both neurological deficit severity and post discharge rehabilitation as compared to accuracy with the previous method (46-54% and 26-62% respectively). Conclusion: In this study, utilization of the “Instructor” shows promising results for reliable data retrieval. Moving forward, we will use Instructor to analyze the neurological deficit type, follow-up imaging type and findings based on imaging, and expand this approach to other sections of the IPSS forms. LLMs may reduce investigator workload and increase the efficiency of observational research for rare, underserved diseases like pediatric stroke in the future.

求助该文献

Abstract 134: Use of Large Language Model to Allow Reliable Data Acquisition for International Pediatric Stroke Study

今日热心研友