判别式
计算机科学
语音识别
光谱图
情绪识别
召回
卷积神经网络
任务(项目管理)
人工智能
循环神经网络
模式识别(心理学)
人工神经网络
特征提取
特征(语言学)
心理学
认知心理学
语言学
哲学
管理
经济
作者
Mingyi Chen,Xuanji He,Jing Yang,Han Zhang
标识
DOI:10.1109/lsp.2018.2860246
摘要
Speech emotion recognition (SER) is a difficult task due to the complexity of emotions. The SER performances are heavily dependent on the effectiveness of emotional features extracted from the speech. However, most emotional features are sensitive to emotionally irrelevant factors, such as the speaker, speaking styles, and environment. In this letter, we assume that calculating the deltas and delta-deltas for personalized features not only preserves the effective emotional information but also reduces the influence of emotionally irrelevant factors, leading to reduce misclassification. In addition, SER often suffers from the silent frames and emotionally irrelevant frames. Meanwhile, attention mechanism has exhibited outstanding performances in learning relevant feature representations for specific tasks. Inspired by this, we propose a three-dimensional attention-based convolutional recurrent neural networks to learn discriminative features for SER, where the Mel-spectrogram with deltas and delta-deltas are used as input. Experiments on IEMOCAP and Emo-DB corpus demonstrate the effectiveness of the proposed method and achieve the state-of-the-art performance in terms of unweighted average recall.
科研通智能强力驱动
Strongly Powered by AbleSci AI