召回
计算机科学
语音识别
循环神经网络
情绪识别
混乱
样品(材料)
人工智能
模式识别(心理学)
人工神经网络
卷积神经网络
心理学
认知心理学
精神分析
色谱法
化学
作者
Zijiang Zhu,Weihuang Dai,Junshan Li
标识
DOI:10.1016/j.patrec.2020.11.009
摘要
For the problems of inconsistent sample duration and unbalance of sample categories in the speech emotion corpus, this paper proposes a speech emotion recognition model based on Bi-GRU (Bidirection Gated Recurrent Unit) and Focal Loss. The model has been improved on the basis of learning CRNN (Convolutional Recurrent Neural Network) deeply. In CRNN, Bi-GRU is used to effectively lengthen the samples of the speech with short duration, and Focal Loss function is used to deal with the difficulties in classification caused by the imbalance of emotional categories of the samples. Through different methods for experimental comparison, weighted average recall (WAR), unweighted average recall (UAR) and confusion matrix (CM) are used as evaluation index of the algorithm. The experimental results show that the speech emotion recognition model proposed in this paper improves the recognition accuracy and the imbalance of IEMOCAP database samples, and can effectively prove that the improvement of speech emotion recognition performance is not due to the adjustment of model parameters or the change of the model topology.
科研通智能强力驱动
Strongly Powered by AbleSci AI