计算机科学
人工智能
卷积神经网络
深度学习
模式
特征提取
背景(考古学)
模态(人机交互)
人工神经网络
循环神经网络
任务(项目管理)
特征(语言学)
模式识别(心理学)
语音识别
机器学习
社会学
管理
经济
古生物学
哲学
生物
语言学
社会科学
作者
Panagiotis Tzirakis,George Trigeorgis,Mihalis A. Nicolaou,Björn Schuller,Stefanos Zafeiriou
出处
期刊:IEEE Journal of Selected Topics in Signal Processing
[Institute of Electrical and Electronics Engineers]
日期:2017-12-01
卷期号:11 (8): 1301-1309
被引量:492
标识
DOI:10.1109/jstsp.2017.2764438
摘要
Automatic affect recognition is a challenging task due to the various modalities emotions can be expressed with. Applications can be found in many domains including multimedia retrieval and human computer interaction. In recent years, deep neural networks have been used with great success in determining emotional states. Inspired by this success, we propose an emotion recognition system using auditory and visual modalities. To capture the emotional content for various styles of speaking, robust features need to be extracted. To this purpose, we utilize a Convolutional Neural Network (CNN) to extract features from the speech, while for the visual modality a deep residual network (ResNet) of 50 layers. In addition to the importance of feature extraction, a machine learning algorithm needs also to be insensitive to outliers while being able to model the context. To tackle this problem, Long Short-Term Memory (LSTM) networks are utilized. The system is then trained in an end-to-end fashion where - by also taking advantage of the correlations of the each of the streams - we manage to significantly outperform the traditional approaches based on auditory and visual handcrafted features for the prediction of spontaneous and natural emotions on the RECOLA database of the AVEC 2016 research challenge on emotion recognition.
科研通智能强力驱动
Strongly Powered by AbleSci AI