AI-driven speech biomarkers for disease diagnosis and monitoring: a systematic review and meta-analysis

医学心理信息疾病荟萃分析梅德林样本量测定萧条（经济学）认知听力学考试（生物学）接收机工作特性诊断准确性子群分析生活质量（医疗保健）医学诊断选择偏差系统回顾内科学认知障碍大样本质量评定质量得分试验预测值疾病严重程度诊断优势比

作者

Yi Yang,Xiao-Yan Zhao,Peng Zhao,David Ying,Junyu Wang,Yihe Jiang,Qiaoqin Wan

出处

期刊：BMJ evidence-based medicine [BMJ]
日期：2025-10-08 卷期号：31 (1): 46-56

链接

nih.govdoi.org

标识

DOI：10.1136/bmjebm-2025-113759

摘要

Objective This study aims to comprehensively review the literature on the use of speech biomarkers in disease diagnosis and monitoring, focusing on recording protocols, speech tasks, speech features and processing algorithms. Study design Systematic review and meta-analysis. Data sources We conducted a search of six databases: PubMed, Embase, Scopus, Web of Science, PsycINFO and IEEE Xplore, covering studies published from database inception to May 2024. Main outcome measures The quality of the included studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies tool (QUADAS-2) and the Quality Assessment of Prognostic Accuracy Studies (QUAPAS). Pooled sensitivity and specificity were calculated using a random-effects model. Subgroup analyses examined potential sources of heterogeneity, such as disease type, language, speech tasks, features and algorithms. Results A total of 96 studies were included, with 83 adopting a cross-sectional design and 50 having sample sizes of fewer than 100 participants. Assessment with QUADAS-2 and QUAPAS revealed that most included studies exhibited a high risk of bias in patient selection and index test domains, while concerns regarding applicability were generally low across studies. These studies covered 20 different diseases, with cognitive disorders, depression and Parkinson’s disease being the most frequently studied. The pooled sensitivity and specificity for diagnostic models were 0.80 (95% CI 0.74 to 0.86) and 0.77 (95% CI 0.69 to 0.84) for psychiatric disorders (11 studies, n=2577); 0.85 (95% CI 0.83 to 0.88) and 0.83 (95% CI 0.79 to 0.86) for cognitive disorders (27 studies, n=2068); and 0.81 (95% CI 0.76 to 0.85) and 0.83 (95% CI 0.78 to 0.88) for movement disorders (20 studies, n=852). Further subgroup analyses identified recording device, language, speech task, speech features and algorithm selection as significant contributors to heterogeneity. Conclusions This review and meta-analysis of 96 studies highlights the influence of devices, environments, languages, tasks, features and algorithms on speech model performance across diseases. While speech biomarkers show promise for screening and monitoring—particularly via smartphones—the high risk of bias in many studies, especially in patient selection and index test interpretation, limits the strength of current evidence. Future large-scale, prospective studies are needed to validate generalisability and support clinical implementation. PROSPERO registration number CRD42024551962.

求助该文献

AI-driven speech biomarkers for disease diagnosis and monitoring: a systematic review and meta-analysis

今日热心研友