计算机科学
语音识别
韵律
发声
特征提取
分类器(UML)
人工智能
模式识别(心理学)
医学
听力学
作者
N. P. Narendra,Björn W. Schuller,Paavo Alku
标识
DOI:10.1109/taslp.2021.3078364
摘要
Developing automatic methods to detect Parkinson's disease (PD) from speech has attracted increasing interest as these techniques can potentially be used in telemonitoring health applications. This article studies the utilization of voice source information in the detection of PD using two classifier architectures: traditional pipeline approach and end-to-end approach. The former consists of feature extraction and classifier stages. In feature extraction, the baseline acoustic features-consisting of articulation, phonation, and prosody features-were computed and voice source information was extracted using glottal features that were estimated by iterative adaptive inverse filtering (IAIF) and quasi-closed phase (QCP) glottal inverse filtering methods. Support vector machine classifiers were developed utilizing the baseline and glottal features extracted from every speech utterance and the corresponding healthy/PD labels. The end-to-end approach uses deep learning models which were trained using both raw speech waveforms and raw voice source waveforms. In the latter, two glottal inverse filtering methods (IAIF and QCP) and zero frequency filtering method were utilized. The deep learning architecture consists of a combination of convolutional layers followed by a multilayer perceptron. Experiments were performed using PC-GITA speech database. From the traditional pipeline systems, the highest classification accuracy (67.93%) was given by combination of baseline and QCP-based glottal features. From the end-to-end-systems, the highest accuracy (68.56%) was given by the system trained using QCP-based glottal flow signals. Even though classification accuracies were modest for all systems, the study is encouraging as the extraction of voice source information was found to be most effective in both approaches.
科研通智能强力驱动
Strongly Powered by AbleSci AI