计算机科学
声道
语音识别
感知
Mel倒谱
支持向量机
人工智能
言语感知
卷积神经网络
特征提取
模式识别(心理学)
自然语言处理
心理学
神经科学
作者
Ming‐Hao Du,Shuang Liu,Tao Wang,Wenquan Zhang,Yufeng Ke,Long Chen,Dong Ming
标识
DOI:10.1016/j.jad.2022.11.060
摘要
Increasing depression patients puts great pressure on clinical diagnosis. Audio-based diagnosis is a helpful auxiliary tool for early mass screening. However, current methods consider only speech perception features, ignoring patients' vocal tract changes, which may partly result in the poor recognition.This work proposes a novel machine speech chain model for depression recognition (MSCDR) that can capture text-independent depressive speech representation from the speaker's mouth to the listener's ear to improve recognition performance. In the proposed MSCDR, linear predictive coding (LPC) and Mel-frequency cepstral coefficients (MFCC) features are extracted to describe the processes of speech generation and of speech perception, respectively. Then, a one-dimensional convolutional neural network and a long short-term memory network sequentially capture intra- and inter-segment dynamic depressive features for classification.We tested the MSCDR on two public datasets with different languages and paradigms, namely, the Distress Analysis Interview Corpus-Wizard of Oz and the Multi-modal Open Dataset for Mental-disorder Analysis. The accuracy of the MSCDR on the two datasets was 0.77 and 0.86, and the average F1 score was 0.75 and 0.86, which were better than the other existing methods. This improvement reveals the complementarity of speech production and perception features in carrying depressive information.The sample size was relatively small, which may limit the application in clinical translation to some extent.This experiment proves the good generalization ability and superiority of the proposed MSCDR and suggests that the vocal tract changes in patients with depression deserve attention for audio-based depression diagnosis.
科研通智能强力驱动
Strongly Powered by AbleSci AI