计算机科学
语音识别
机制(生物学)
情绪识别
卷积神经网络
人工智能
认识论
哲学
作者
Chao Li,Jinlong Jiao,Yiqin Zhao,Ziping Zhao
标识
DOI:10.1109/aciiw.2019.8925283
摘要
Discrete speech emotion recognition (SER), the assignment of a single emotion label to an entire speech utterance, is typically performed as a sequence-to-label task. The predominant approach to SER to date is based on recurrent neural networks. Their success on this task is often linked to their ability to capture unbounded context. In this paper we introduce new gated convolutional networks and apply them to SER, which can be more efficient since they allow parallelization over sequential tokens. We present a novel model architecture that incorporates a gated convolutional neural network and a temporal attention-based localization method for speech emotion recognition. To the best of the authors' knowledge, this is the first time that such a hybrid architecture is employed for SER. We demonstrate the effectiveness of our approach on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpus. The experimental results demonstrate that our proposed model outperforms current state-of-the-art approaches.
科研通智能强力驱动
Strongly Powered by AbleSci AI