计算机科学
变压器
语音识别
情绪识别
语音活动检测
深度学习
游戏娱乐
人工智能
语音处理
机器学习
工程类
电气工程
电压
艺术
视觉艺术
作者
Ülkü Bayraktar,Hasan Kilimci,H. Hakan Kılınç,Zeynep Hilal Kilimci
标识
DOI:10.1109/isas60782.2023.10391313
摘要
Speech Emotion Recognition (SER) is a field of research and technology that focuses on the automatic detection and classification of emotional states conveyed through speech. SER has a wide range of applications, including customer service, healthcare, entertainment, market research, and so on. Also, it has the potential to enhance human-computer interaction and improve the understanding of human emotional behavior. So far the studies mostly focus on traditional machine learning algorithms and deep learning architectures for the purpose of detection of the speech emotion while this work takes one step forward using the cutting edge technology called as transformers. To show the effectiveness of the transformer models Hidden-Unit BERT, Squeezed and Efficient Wav2Vec, Multi-lingual Concatenated transformer, and Audio Spectogram transformer models are employed to recognize the speech emotion on publicy available and used datasets namely, EMO-DB, RAVDESS, and TESS. Experiment result demonstrate that Audio Spectogram transformer model exhibits remarkable classification results specifically, 75.42% of accuracy for EMO-DB, 88.17% of accuracy for RAVDESS, and 98.17% of accuracy for TESS datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI