一般化
计算机科学
人工智能
领域(数学分析)
模式识别(心理学)
语音识别
面部表情
自然语言处理
机器学习
数学
数学分析
作者
Elena Ryumina,Denis Dresvyanskiy,Alexey Karpov
出处
期刊:Neurocomputing
[Elsevier BV]
日期:2022-10-07
卷期号:514: 435-450
被引量:51
标识
DOI:10.1016/j.neucom.2022.10.013
摘要
Many researchers have been seeking robust emotion recognition system for already last two decades. It would advance computer systems to a new level of interaction, providing much more natural feedback during human-computer interaction due to analysis of user affect state. However, one of the key problems in this domain is a lack of generalization ability: we observe dramatic degradation of model performance when it was trained on one corpus and evaluated on another one. Although some studies were done in this direction, visual modality still remains under-investigated. Therefore, we introduce the visual cross-corpus study conducted with the utilization of eight corpora, which differ in recording conditions, participants’ appearance characteristics, and complexity of data processing. We propose a visual-based end-to-end emotion recognition framework, which consists of the robust pre-trained backbone model and temporal sub-system in order to model temporal dependencies across many video frames. In addition, a detailed analysis of mistakes and advantages of the backbone model is provided, demonstrating its high ability of generalization. Our results show that the backbone model has achieved the accuracy of 66.4% on the AffectNet dataset, outperforming all the state-of-the-art results. Moreover, the CNN-LSTM model has demonstrated a decent efficacy on dynamic visual datasets during cross-corpus experiments, achieving comparable with state-of-the-art results. In addition, we provide backbone and CNN-LSTM models for future researchers: they can be accessed via GitHub.
科研通智能强力驱动
Strongly Powered by AbleSci AI