计算机科学
注释
工作流程
信息抽取
情报检索
瓶颈
自然语言处理
关系抽取
人工智能
质量(理念)
数据库
认识论
哲学
嵌入式系统
作者
Enwei Zhu,Qilin Sheng,Huanwan Yang,Yiyang Liu,Ting Cai,Jinpeng Li
标识
DOI:10.1016/j.artmed.2023.102573
摘要
Medical information extraction consists of a group of natural language processing (NLP) tasks, which collaboratively convert clinical text to pre-defined structured formats. This is a critical step to exploit electronic medical records (EMRs). Given the recent thriving NLP technologies, model implementation and performance seem no longer an obstacle, whereas the bottleneck locates on a high-quality annotated corpus and the whole engineering workflow. This study presents an engineering framework consisting of three tasks, i.e., medical entity recognition, relation extraction and attribute extraction. Within this framework, the whole workflow is demonstrated from EMR data collection through model performance evaluation. Our annotation scheme is designed to be comprehensive and compatible between the multiple tasks. With the EMRs from a general hospital in Ningbo, China, and the manual annotation by experienced physicians, our corpus is of large scale and high quality. Built upon this Chinese clinical corpus, the medical information extraction system show performance that approaches human annotation. The annotation scheme, (a subset of) the annotated corpus, and the code are all publicly released, to facilitate further research.
科研通智能强力驱动
Strongly Powered by AbleSci AI