亲爱的研友该休息了!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!身体可是革命的本钱,早点休息,好梦!

Clinical Text Data in Machine Learning: Systematic Review

机器学习 计算机科学 人工智能 自然语言处理 瓶颈 情报检索 嵌入式系统
作者
‪Irena Spasić,Goran Nenadić
出处
期刊:JMIR medical informatics [JMIR Publications Inc.]
卷期号:8 (3): e17984-e17984 被引量:326
标识
DOI:10.2196/17984
摘要

Background Clinical narratives represent the main form of communication within health care, providing a personalized account of patient history and assessments, and offering rich information for clinical decision making. Natural language processing (NLP) has repeatedly demonstrated its feasibility to unlock evidence buried in clinical narratives. Machine learning can facilitate rapid development of NLP tools by leveraging large amounts of text data. Objective The main aim of this study was to provide systematic evidence on the properties of text data used to train machine learning approaches to clinical NLP. We also investigated the types of NLP tasks that have been supported by machine learning and how they can be applied in clinical practice. Methods Our methodology was based on the guidelines for performing systematic reviews. In August 2018, we used PubMed, a multifaceted interface, to perform a literature search against MEDLINE. We identified 110 relevant studies and extracted information about text data used to support machine learning, NLP tasks supported, and their clinical applications. The data properties considered included their size, provenance, collection methods, annotation, and any relevant statistics. Results The majority of datasets used to train machine learning models included only hundreds or thousands of documents. Only 10 studies used tens of thousands of documents, with a handful of studies utilizing more. Relatively small datasets were utilized for training even when much larger datasets were available. The main reason for such poor data utilization is the annotation bottleneck faced by supervised machine learning algorithms. Active learning was explored to iteratively sample a subset of data for manual annotation as a strategy for minimizing the annotation effort while maximizing the predictive performance of the model. Supervised learning was successfully used where clinical codes integrated with free-text notes into electronic health records were utilized as class labels. Similarly, distant supervision was used to utilize an existing knowledge base to automatically annotate raw text. Where manual annotation was unavoidable, crowdsourcing was explored, but it remains unsuitable because of the sensitive nature of data considered. Besides the small volume, training data were typically sourced from a small number of institutions, thus offering no hard evidence about the transferability of machine learning models. The majority of studies focused on text classification. Most commonly, the classification results were used to support phenotyping, prognosis, care improvement, resource management, and surveillance. Conclusions We identified the data annotation bottleneck as one of the key obstacles to machine learning approaches in clinical NLP. Active learning and distant supervision were explored as a way of saving the annotation efforts. Future research in this field would benefit from alternatives such as data augmentation and transfer learning, or unsupervised learning, which do not require data annotation.

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
3秒前
烟消云散发布了新的文献求助10
7秒前
8秒前
16秒前
27秒前
study发布了新的文献求助10
34秒前
39秒前
Huzhu应助科研通管家采纳,获得10
43秒前
领导范儿应助科研通管家采纳,获得10
43秒前
50秒前
火星完成签到 ,获得积分10
1分钟前
1分钟前
1分钟前
汉堡包应助study采纳,获得10
1分钟前
1分钟前
1分钟前
study发布了新的文献求助10
1分钟前
1分钟前
2分钟前
张晨完成签到 ,获得积分10
2分钟前
2分钟前
2分钟前
量子星尘发布了新的文献求助10
2分钟前
Huzhu应助科研通管家采纳,获得10
2分钟前
赘婿应助科研通管家采纳,获得10
2分钟前
2分钟前
3分钟前
3分钟前
科研通AI6应助qc采纳,获得10
3分钟前
好运常在完成签到 ,获得积分10
3分钟前
3分钟前
silence完成签到 ,获得积分10
4分钟前
4分钟前
小二郎应助spike采纳,获得10
4分钟前
Huzhu应助科研通管家采纳,获得10
4分钟前
Owen应助科研通管家采纳,获得10
4分钟前
4分钟前
4分钟前
Tumumu完成签到 ,获得积分10
5分钟前
5分钟前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Iron toxicity and hematopoietic cell transplantation: do we understand why iron affects transplant outcome? 2000
Teacher Wellbeing: Noticing, Nurturing, Sustaining, and Flourishing in Schools 1200
List of 1,091 Public Pension Profiles by Region 1041
睡眠呼吸障碍治疗学 600
A Technologist’s Guide to Performing Sleep Studies 500
EEG in Childhood Epilepsy: Initial Presentation & Long-Term Follow-Up 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 物理化学 基因 遗传学 催化作用 冶金 量子力学 光电子学
热门帖子
关注 科研通微信公众号,转发送积分 5488594
求助须知:如何正确求助?哪些是违规求助? 4587405
关于积分的说明 14413853
捐赠科研通 4518798
什么是DOI,文献DOI怎么找? 2476092
邀请新用户注册赠送积分活动 1461552
关于科研通互助平台的介绍 1434505