情绪识别
人工智能
特征(语言学)
情感计算
模式识别(心理学)
计算机科学
面部表情
融合
情绪分类
语音识别
面部识别系统
特征提取
计算机视觉
语言学
哲学
作者
Jindi Bao,Jianjun Qian,Jian Yang
标识
DOI:10.1109/taffc.2025.3528636
摘要
Multimodal emotion recognition based on facial videos aims to extract features from different modalities to identify human emotions. The previous work focus on designing various fusion schemes to combine heterogeneous modal data. However, most studies have overlooked the role of different modalities in emotion recognition and have not fully utilized the intrinsic connections between modalities. Furthermore, the multimodal data from facial videos also contain various distractions bad for emotion analysis. How to reduce the impact of distractions and enable a model to mine effective information for emotion recognition from different modalities is still a challenge problem. To address above issue, we propose a SVD-guided multimodal feature fusion method based on facial video for emotion recognition, which uses a hierarchical fusion mechanism and adopts different loss strategies at each level to learn multimodal feature representation. Specifically, we fuse the facial expression and rPPG signal (or Point-of-Gaze) by using the weak supervision strategy and contrastive learning. Subsequently, the fused feature of facial expression and rPPG signal and the fused feature of facial expression and Point-of-Gaze are combined together to construct the unified multimodal feature matrix. Based on this, Singular Value Decomposition (SVD) is used to refine the redundancy information caused by the multimodal fusion and guide the neural network to learn discriminative emotion feature. At the same time, a consistent loss is developed to enhance the multimodal representation. Experiments on three public datasets show that the proposed method achieves better results over the compared methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI