作者
Yi Yang,Xiao-Yan Zhao,Peng Zhao,David Ying,Junyu Wang,Yihe Jiang,Qiaoqin Wan
摘要
Objective This study aims to comprehensively review the literature on the use of speech biomarkers in disease diagnosis and monitoring, focusing on recording protocols, speech tasks, speech features and processing algorithms. Study design Systematic review and meta-analysis. Data sources We conducted a search of six databases: PubMed, Embase, Scopus, Web of Science, PsycINFO and IEEE Xplore, covering studies published from database inception to May 2024. Main outcome measures The quality of the included studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies tool (QUADAS-2) and the Quality Assessment of Prognostic Accuracy Studies (QUAPAS). Pooled sensitivity and specificity were calculated using a random-effects model. Subgroup analyses examined potential sources of heterogeneity, such as disease type, language, speech tasks, features and algorithms. Results A total of 96 studies were included, with 83 adopting a cross-sectional design and 50 having sample sizes of fewer than 100 participants. Assessment with QUADAS-2 and QUAPAS revealed that most included studies exhibited a high risk of bias in patient selection and index test domains, while concerns regarding applicability were generally low across studies. These studies covered 20 different diseases, with cognitive disorders, depression and Parkinson’s disease being the most frequently studied. The pooled sensitivity and specificity for diagnostic models were 0.80 (95% CI 0.74 to 0.86) and 0.77 (95% CI 0.69 to 0.84) for psychiatric disorders (11 studies, n=2577); 0.85 (95% CI 0.83 to 0.88) and 0.83 (95% CI 0.79 to 0.86) for cognitive disorders (27 studies, n=2068); and 0.81 (95% CI 0.76 to 0.85) and 0.83 (95% CI 0.78 to 0.88) for movement disorders (20 studies, n=852). Further subgroup analyses identified recording device, language, speech task, speech features and algorithm selection as significant contributors to heterogeneity. Conclusions This review and meta-analysis of 96 studies highlights the influence of devices, environments, languages, tasks, features and algorithms on speech model performance across diseases. While speech biomarkers show promise for screening and monitoring—particularly via smartphones—the high risk of bias in many studies, especially in patient selection and index test interpretation, limits the strength of current evidence. Future large-scale, prospective studies are needed to validate generalisability and support clinical implementation. PROSPERO registration number CRD42024551962.