Development of an algorithm using natural language processing to identify metastatic breast cancer patients from clinical notes.

判决医学转移乳腺癌人工智能自然语言处理癌症计算机科学集合（抽象数据类型）机器学习肿瘤科内科学程序设计语言

作者

Krishna Kumar Swaminathan,Emma Mendonca,Pranay Mukherjee,Karpagavalli Thirumalai,Rachel C. Newsome,Babu Narayanan

出处

期刊：Journal of Clinical Oncology [Lippincott Williams & Wilkins]
日期：2020-05-20 卷期号：38 (15_suppl): e14056-e14056 被引量：2

标识

DOI：10.1200/jco.2020.38.15_suppl.e14056

摘要

e14056 Background: Determination of the metastatic status of a patient is important for outcomes research and candidacy for clinical trials. Structured data in EMR may not always capture the metastatic status, and it is useful to extract it automatically from physician notes. Contextual understanding of the notes is important to resolve issues such as a) local vs distal metastasis b) statements involving family history of metastasis or physician instructing the patient to look for certain signs of metastasis c) text indicating suspicion of metastasis or absence of metastasis d) indirect utterances, e.g. cancer has spread to the bone. e) corrections to previous findings. Methods: We used a set of 20138 breast cancer patients from Concerto HealthAI real world oncology dataset that includes data from CancerLinQ Discovery to build & validate the set of NLP algorithms. 5300 sentences from 1500 patients were annotated & algorithms manually validated by data abstractors for 500 patients. The algorithms developed were the following: 1) Classification of a sentence into 3 classes: Distal/Local metastasis, Suspicious & Other 2) Classification of a sentence into 2 classes: Distal or Local 3) Classification of a patient into 2 classes: Distal metastasis or not distal metastasis 4) Multi label classification for detecting sites of metastasis. Sentence level algorithms were built using Deep Learning and patient level aggregation of sentence level prediction was done using ML approaches including temporal features. Pretrained ULMFiT model was fine-tuned with Concerto HealthAI’s corpus for sentence classification tasks. Results: At a sentence level, we obtained an accuracy of 0.85 for the distal/local vs suspicious vs irrelevant model and 0.97 for the distal vs not distal metastasis model. Our patient level metrics are shown in the table. The classes used for sites of metastasis are Brain, Bone, Lung, Liver, Distant Lymph nodes & Unknown sites. Subset accuracy (mean fraction of labels which match ) of 0.93 was obtained on the hold out test set at patient level. Conclusions: Metastatic status & site of metastasis can be reliably extracted automatically from clinical notes using deep learning techniques. This information will be valuable for clinical trial matching, outcomes research and other applications. [Table: see text]

求助该文献

Development of an algorithm using natural language processing to identify metastatic breast cancer patients from clinical notes.

今日热心研友