计算机科学
变压器
编码器
特征提取
人工神经网络
语音识别
循环神经网络
人工智能
模式识别(心理学)
短时记忆
残差神经网络
编码(社会科学)
时滞神经网络
电压
物理
操作系统
统计
量子力学
数学
作者
Siqi Han,Feng Leng,Zitong Jin
标识
DOI:10.1109/cisce52179.2021.9445906
摘要
As a challenging pattern recognition task, speech emotion recognition has attracted more and more attention in recent years and is widely used in medical, Affective Computing, and other fields. In this paper, we proposed a parallel network of ResNet-CNN-Transformer Encoder. The Res-Net is used to alleviate the problems caused by the deepening of the network. The CNN calculates the fewer parameters to increase the fitting expression ability of the network. Due to the traditional recurrent neural network, with a long-term dependence on the feature extraction of speech and text sequences and sequence attributes not capturing long-distance features, the multi attention mechanism of the transformer coding layer is used to parallelize the sequence, improve the processing speed and extract the emotional semantic information in the sequence. Experiments are carried out on the RAVDESS dataset. Our results demonstrate the effectiveness of the proposed method and make a significant improvement compared with the previous results.
科研通智能强力驱动
Strongly Powered by AbleSci AI