Identifying stroke-related quantified evidence from electronic health records in real-world studies

计算机科学标准化 F1得分人工智能冲程（发动机）随机森林机器学习关系抽取管道（软件）比例（比率）健康档案召回病历自然语言处理情报检索信息抽取医疗保健医学心理学内科学机械工程物理量子力学工程类经济认知心理学程序设计语言经济增长操作系统

作者

Lin Yang,Xiaoshuo Huang,Jiayang Wang,Xin Yang,Lingling Ding,Zixiao Li,Jiao Li

出处

期刊：Artificial Intelligence in Medicine [Elsevier BV]
日期：2023-04-23 卷期号：140: 102552-102552 被引量：8

链接

nih.govdoi.org

标识

DOI：10.1016/j.artmed.2023.102552

摘要

Stroke is one of the leading causes of death and disability worldwide. The National Institutes of Health Stroke Scale (NIHSS) scores in electronic health records (EHRs), which quantitatively describe patients' neurological deficits in evidence-based treatment, are crucial in stroke-related clinical investigations. However, the free-text format and lack of standardization inhibit their effective use. Automatically extracting the scale scores from the clinical free text so that its potential value in real-world studies is realized has become an important goal. This study aims to develop an automated method to extract scale scores from the free text of EHRs. We propose a two-step pipeline method to identify NIHSS items and numerical scores and validate its feasibility using a freely accessible critical care database: MIMIC-III (Medical Information Mart for Intensive Care III). First, we utilize MIMIC-III to create an annotated corpus. Then, we investigate possible machine learning methods for two subtasks, NIHSS item and score recognition and item-score relation extraction. In the evaluation, we conduct both task-specific and end-to-end evaluations and compare our method with the rule-based method using precision, recall and F1 scores as evaluation metrics. We use all available discharge summaries of stroke cases in MIMIC-III. The annotated NIHSS corpus contains 312 cases, 2929 scale items, 2774 scores and 2733 relations. The results show that the best F1-score of our method was 0.9006, which was attained by combining BERT-BiLSTM-CRF and Random Forest, and it outperformed the rule-based method (F1-score = 0.8098). In the end-to-end task, our method could successfully recognize the item “1b level of consciousness questions”, the score “1” and their relation “(‘1b level of consciousness questions’, ‘1’, ‘has value’)” from the sentence “1b level of consciousness questions: said name = 1”, while the rule-based method could not. The two-step pipeline method we propose is an effective approach to identify NIHSS items, scores and their relations. With its help, clinical investigators can easily retrieve and access structured scale data, thereby supporting stroke-related real-world studies.

求助该文献

最长约 10秒，即可获得该文献文件

Identifying stroke-related quantified evidence from electronic health records in real-world studies

今日热心研友