Performance of Natural Language Processing Model in Extracting Information from Free-Text Radiology Reports: A Systematic Review and Meta-Analysis

荟萃分析诊断优势比接收机工作特性数据提取诊断试验中的似然比梅德林科克伦图书馆人工智能二元分析随机效应模型自然语言处理医学优势比计算机科学林地随机森林信息抽取机器学习内科学法学政治学

作者

Qingwen Yang,Jiahui Jiang,Xue Dong,Huai Yang,Qi Wang,Zhenghan Yang,Dawei Yang,Peng Liu

链接

nih.govdoi.org

标识

摘要

Abstract The free-text format is widely used in radiology reports for its flexibility of expression; however, its unstructured nature leads to substantial amounts of report data remaining underutilized. A natural language processing (NLP) model for automatic extraction of information from free-text radiology reports can significantly contribute to the development of structured databases, thereby optimizing data utilization. This study aimed to perform a systematic review and meta-analysis that evaluates the performance of NLP systems in extracting information from free-text radiology reports. A systematic literature search was conducted from November 21 to 23, 2024, in PubMed/MEDLINE, Embase, EBSCO, Ovid, Web of Science, and the Cochrane Library. Study quality was assessed using the QUADAS-2 tool. A bivariate random-effects model was applied to obtain the pooled sensitivity, specificity, diagnostic odds ratio (DOR), positive likelihood ratio (PLR), negative likelihood ratio (NLR), and area under the summary receiver operating characteristic curve (AUC). Subgroup analyses (e.g., NLP model types, dataset source, and language types) and a random-effects multivariable meta-regression based on the restricted maximum likelihood (REML) method were conducted to explore potential sources of heterogeneity. Sensitivity analyses (excluding high-risk studies, leave-one-out method, and data integration strategy comparison) were performed to assess the robustness of the findings. A total of 28 studies were included in the final analysis, with 421,692 extracted entities in 51,187 free-text radiology reports. NLP systems achieved high pooled sensitivity (91% [95% CI: 87, 93]) and specificity (96% [95% CI: 93, 97]), with a diagnostic odds ratio of 220 (95% CI: 112, 435) and an area under the curve of 0.98 (95% CI: 0.96, 0.99). Subgroup analysis revealed significantly better performance for extracting single anatomical sites (AUC 0.99; 95% CI: 0.97, 0.99) compared with multiple sites (AUC 0.95; 95% CI: 0.93, 0.97; p = 0.001). No significant differences were observed across NLP model types, dataset sources, external validations, languages, or imaging modalities. Multivariable meta-regression further identified anatomical site as the only significant contributor to heterogeneity (coefficient = 2.26; 95% CI: 0.25, 4.27; p = 0.027). Sensitivity analyses confirmed the robustness of the findings, and no evidence of publication bias was detected. NLP models demonstrated excellent performance in extracting information from free-text radiology reports. However, the observed heterogeneity highlights the need for enhanced report standardization and improved model generalizability.

求助该文献

最长约 10秒，即可获得该文献文件

Performance of Natural Language Processing Model in Extracting Information from Free-Text Radiology Reports: A Systematic Review and Meta-Analysis

今日热心研友