已入深夜,您辛苦了!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!祝你早点完成任务,早点休息,好梦!

Clinical Text Data in Machine Learning: Systematic Review

机器学习 计算机科学 人工智能 自然语言处理 瓶颈 情报检索 嵌入式系统
作者
‪Irena Spasić,Goran Nenadić
出处
期刊:JMIR medical informatics [JMIR Publications]
卷期号:8 (3): e17984-e17984 被引量:326
标识
DOI:10.2196/17984
摘要

Background Clinical narratives represent the main form of communication within health care, providing a personalized account of patient history and assessments, and offering rich information for clinical decision making. Natural language processing (NLP) has repeatedly demonstrated its feasibility to unlock evidence buried in clinical narratives. Machine learning can facilitate rapid development of NLP tools by leveraging large amounts of text data. Objective The main aim of this study was to provide systematic evidence on the properties of text data used to train machine learning approaches to clinical NLP. We also investigated the types of NLP tasks that have been supported by machine learning and how they can be applied in clinical practice. Methods Our methodology was based on the guidelines for performing systematic reviews. In August 2018, we used PubMed, a multifaceted interface, to perform a literature search against MEDLINE. We identified 110 relevant studies and extracted information about text data used to support machine learning, NLP tasks supported, and their clinical applications. The data properties considered included their size, provenance, collection methods, annotation, and any relevant statistics. Results The majority of datasets used to train machine learning models included only hundreds or thousands of documents. Only 10 studies used tens of thousands of documents, with a handful of studies utilizing more. Relatively small datasets were utilized for training even when much larger datasets were available. The main reason for such poor data utilization is the annotation bottleneck faced by supervised machine learning algorithms. Active learning was explored to iteratively sample a subset of data for manual annotation as a strategy for minimizing the annotation effort while maximizing the predictive performance of the model. Supervised learning was successfully used where clinical codes integrated with free-text notes into electronic health records were utilized as class labels. Similarly, distant supervision was used to utilize an existing knowledge base to automatically annotate raw text. Where manual annotation was unavoidable, crowdsourcing was explored, but it remains unsuitable because of the sensitive nature of data considered. Besides the small volume, training data were typically sourced from a small number of institutions, thus offering no hard evidence about the transferability of machine learning models. The majority of studies focused on text classification. Most commonly, the classification results were used to support phenotyping, prognosis, care improvement, resource management, and surveillance. Conclusions We identified the data annotation bottleneck as one of the key obstacles to machine learning approaches in clinical NLP. Active learning and distant supervision were explored as a way of saving the annotation efforts. Future research in this field would benefit from alternatives such as data augmentation and transfer learning, or unsupervised learning, which do not require data annotation.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
刚刚
tkx是流氓兔完成签到,获得积分10
1秒前
mm完成签到,获得积分0
4秒前
胡宇轩发布了新的文献求助10
4秒前
小牛完成签到,获得积分10
4秒前
华仔完成签到 ,获得积分10
4秒前
王木木完成签到 ,获得积分10
8秒前
lily发布了新的文献求助30
10秒前
13秒前
15秒前
布曲完成签到 ,获得积分10
15秒前
Ashore完成签到 ,获得积分10
16秒前
哑巴和喇叭完成签到 ,获得积分10
16秒前
万能图书馆应助mm采纳,获得10
17秒前
你好棒呀完成签到,获得积分10
17秒前
阔达静曼完成签到 ,获得积分10
18秒前
touka发布了新的文献求助10
18秒前
chimchim发布了新的文献求助10
19秒前
fog完成签到 ,获得积分10
19秒前
lsabelie完成签到,获得积分20
20秒前
Owen应助水若琳采纳,获得10
20秒前
共享精神应助orangel采纳,获得10
22秒前
D_SUPER完成签到,获得积分10
26秒前
boyaqin完成签到,获得积分10
26秒前
自然冥茗完成签到,获得积分10
26秒前
小二郎应助fenghp采纳,获得10
27秒前
悦耳伟宸完成签到 ,获得积分10
28秒前
木槿完成签到,获得积分10
29秒前
合适鲂完成签到,获得积分10
29秒前
666完成签到 ,获得积分10
30秒前
光亮静槐完成签到 ,获得积分10
31秒前
共享精神应助马到成功采纳,获得30
31秒前
33秒前
今晚去吃烤肉完成签到,获得积分10
34秒前
34秒前
娜娜子完成签到 ,获得积分10
34秒前
callmefather完成签到,获得积分10
35秒前
羊羔蓉完成签到,获得积分10
36秒前
故然完成签到 ,获得积分10
36秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Psychopathic Traits and Quality of Prison Life 1000
Development Across Adulthood 1000
Chemistry and Physics of Carbon Volume 18 800
The formation of Australian attitudes towards China, 1918-1941 660
Signals, Systems, and Signal Processing 610
天津市智库成果选编 600
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6450974
求助须知:如何正确求助?哪些是违规求助? 8263042
关于积分的说明 17605403
捐赠科研通 5515713
什么是DOI,文献DOI怎么找? 2903501
邀请新用户注册赠送积分活动 1880548
关于科研通互助平台的介绍 1722526