计算机科学
Mel倒谱
语音识别
对话
变压器
情绪识别
编码器
短时记忆
分类器(UML)
人工智能
特征提取
人工神经网络
循环神经网络
量子力学
操作系统
物理
电压
哲学
语言学
作者
Felicia Andayani,Lau Bee Theng,Mark Tee Kit Tsun,Caslon Chua
出处
期刊:IEEE Access
[Institute of Electrical and Electronics Engineers]
日期:2022-01-01
卷期号:10: 36018-36027
被引量:87
标识
DOI:10.1109/access.2022.3163856
摘要
Emotion is a vital component in daily human communication and it helps people understand each other. Emotion recognition plays a crucial role in developing human-computer interaction and computer-based speech emotion recognition. In a nutshell, Speech Emotion Recognition (SER) recognizes emotion signals transmitted through human speech or daily conversation where the emotions in a speech strongly depend on temporal information. Despite the fact that much existing research showed that a hybrid system performs better than traditional single classifiers used in SER, there are some limitations in each of them. As a result, this paper discussed a proposed hybrid Long Short-Term Memory (LSTM) Network and Transformer Encoder to learn the long-term dependencies in speech signals and classify emotions. Speech features are extracted with Mel Frequency Cepstral Coefficient (MFCC) and fed into the proposed hybrid LSTM-Transformer classifier. A range of performance evaluations was conducted on the proposed LSTM-Transformer model. The results indicate that it achieves a significant recognition improvement compared with existing models offered by other published works. The proposed hybrid model reached 75.62%, 85.55%, and 72.49% recognition success with the RAVDESS, Emo-DB, and language-independent datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI