清晨好,您是今天最早来到科研通的研友!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您科研之路漫漫前行!

Clinical Text Data in Machine Learning: Systematic Review

机器学习 计算机科学 人工智能 自然语言处理 瓶颈 情报检索 嵌入式系统
作者
‪Irena Spasić,Goran Nenadić
出处
期刊:JMIR medical informatics [JMIR Publications]
卷期号:8 (3): e17984-e17984 被引量:326
标识
DOI:10.2196/17984
摘要

Background Clinical narratives represent the main form of communication within health care, providing a personalized account of patient history and assessments, and offering rich information for clinical decision making. Natural language processing (NLP) has repeatedly demonstrated its feasibility to unlock evidence buried in clinical narratives. Machine learning can facilitate rapid development of NLP tools by leveraging large amounts of text data. Objective The main aim of this study was to provide systematic evidence on the properties of text data used to train machine learning approaches to clinical NLP. We also investigated the types of NLP tasks that have been supported by machine learning and how they can be applied in clinical practice. Methods Our methodology was based on the guidelines for performing systematic reviews. In August 2018, we used PubMed, a multifaceted interface, to perform a literature search against MEDLINE. We identified 110 relevant studies and extracted information about text data used to support machine learning, NLP tasks supported, and their clinical applications. The data properties considered included their size, provenance, collection methods, annotation, and any relevant statistics. Results The majority of datasets used to train machine learning models included only hundreds or thousands of documents. Only 10 studies used tens of thousands of documents, with a handful of studies utilizing more. Relatively small datasets were utilized for training even when much larger datasets were available. The main reason for such poor data utilization is the annotation bottleneck faced by supervised machine learning algorithms. Active learning was explored to iteratively sample a subset of data for manual annotation as a strategy for minimizing the annotation effort while maximizing the predictive performance of the model. Supervised learning was successfully used where clinical codes integrated with free-text notes into electronic health records were utilized as class labels. Similarly, distant supervision was used to utilize an existing knowledge base to automatically annotate raw text. Where manual annotation was unavoidable, crowdsourcing was explored, but it remains unsuitable because of the sensitive nature of data considered. Besides the small volume, training data were typically sourced from a small number of institutions, thus offering no hard evidence about the transferability of machine learning models. The majority of studies focused on text classification. Most commonly, the classification results were used to support phenotyping, prognosis, care improvement, resource management, and surveillance. Conclusions We identified the data annotation bottleneck as one of the key obstacles to machine learning approaches in clinical NLP. Active learning and distant supervision were explored as a way of saving the annotation efforts. Future research in this field would benefit from alternatives such as data augmentation and transfer learning, or unsupervised learning, which do not require data annotation.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
lya完成签到 ,获得积分10
1秒前
Raki完成签到,获得积分10
18秒前
48秒前
耕牛热完成签到,获得积分10
1分钟前
Lily完成签到 ,获得积分10
1分钟前
领导范儿应助科研通管家采纳,获得10
1分钟前
共享精神应助科研通管家采纳,获得10
1分钟前
1分钟前
1分钟前
1分钟前
2分钟前
无极微光应助小梦采纳,获得20
2分钟前
蒲蒲完成签到 ,获得积分10
2分钟前
alexlpb完成签到,获得积分10
2分钟前
石头完成签到,获得积分10
2分钟前
ding应助zz采纳,获得10
2分钟前
3分钟前
润润润完成签到 ,获得积分10
3分钟前
正直大米完成签到 ,获得积分10
3分钟前
zhao完成签到,获得积分10
3分钟前
111发布了新的文献求助20
3分钟前
4分钟前
香蕉觅云应助Yiphy采纳,获得50
4分钟前
安雯完成签到 ,获得积分10
4分钟前
zgb完成签到 ,获得积分10
4分钟前
4分钟前
Yiphy发布了新的文献求助50
4分钟前
111完成签到,获得积分20
4分钟前
likexin完成签到,获得积分10
4分钟前
5分钟前
5分钟前
5分钟前
科研通AI2S应助科研通管家采纳,获得10
5分钟前
CodeCraft应助科研通管家采纳,获得10
5分钟前
Everything完成签到,获得积分10
5分钟前
5分钟前
zz发布了新的文献求助10
5分钟前
5分钟前
我是笨蛋完成签到 ,获得积分10
6分钟前
YWang发布了新的文献求助10
6分钟前
高分求助中
Psychopathic Traits and Quality of Prison Life 1000
Chemistry and Physics of Carbon Volume 18 800
The formation of Australian attitudes towards China, 1918-1941 660
Signals, Systems, and Signal Processing 610
天津市智库成果选编 600
Forced degradation and stability indicating LC method for Letrozole: A stress testing guide 500
全相对论原子结构与含时波包动力学的理论研究--清华大学 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6451273
求助须知:如何正确求助?哪些是违规求助? 8263209
关于积分的说明 17606258
捐赠科研通 5516005
什么是DOI,文献DOI怎么找? 2903588
邀请新用户注册赠送积分活动 1880627
关于科研通互助平台的介绍 1722625