计算机科学
标准化
F1得分
人工智能
冲程(发动机)
随机森林
机器学习
关系抽取
管道(软件)
比例(比率)
健康档案
召回
病历
自然语言处理
情报检索
信息抽取
医疗保健
医学
心理学
内科学
机械工程
物理
量子力学
工程类
经济
认知心理学
程序设计语言
经济增长
操作系统
作者
Lin Yang,Xiaoshuo Huang,Jiayang Wang,Xin Yang,Lingling Ding,Zixiao Li,Jiao Li
标识
DOI:10.1016/j.artmed.2023.102552
摘要
Stroke is one of the leading causes of death and disability worldwide. The National Institutes of Health Stroke Scale (NIHSS) scores in electronic health records (EHRs), which quantitatively describe patients' neurological deficits in evidence-based treatment, are crucial in stroke-related clinical investigations. However, the free-text format and lack of standardization inhibit their effective use. Automatically extracting the scale scores from the clinical free text so that its potential value in real-world studies is realized has become an important goal. This study aims to develop an automated method to extract scale scores from the free text of EHRs. We propose a two-step pipeline method to identify NIHSS items and numerical scores and validate its feasibility using a freely accessible critical care database: MIMIC-III (Medical Information Mart for Intensive Care III). First, we utilize MIMIC-III to create an annotated corpus. Then, we investigate possible machine learning methods for two subtasks, NIHSS item and score recognition and item-score relation extraction. In the evaluation, we conduct both task-specific and end-to-end evaluations and compare our method with the rule-based method using precision, recall and F1 scores as evaluation metrics. We use all available discharge summaries of stroke cases in MIMIC-III. The annotated NIHSS corpus contains 312 cases, 2929 scale items, 2774 scores and 2733 relations. The results show that the best F1-score of our method was 0.9006, which was attained by combining BERT-BiLSTM-CRF and Random Forest, and it outperformed the rule-based method (F1-score = 0.8098). In the end-to-end task, our method could successfully recognize the item “1b level of consciousness questions”, the score “1” and their relation “(‘1b level of consciousness questions’, ‘1’, ‘has value’)” from the sentence “1b level of consciousness questions: said name = 1”, while the rule-based method could not. The two-step pipeline method we propose is an effective approach to identify NIHSS items, scores and their relations. With its help, clinical investigators can easily retrieve and access structured scale data, thereby supporting stroke-related real-world studies.
科研通智能强力驱动
Strongly Powered by AbleSci AI