光容积图
计算机科学
情绪识别
唤醒
价(化学)
语音识别
情绪分类
人工智能
面部表情
稳健性(进化)
面部识别系统
模态(人机交互)
计算机视觉
模式识别(心理学)
心理学
滤波器(信号处理)
生物化学
化学
物理
量子力学
神经科学
基因
作者
Jixiang Li,Jianxin Peng
标识
DOI:10.1109/jbhi.2024.3430310
摘要
Emotion is a complex physiological phenomenon, and a single modality may be insufficient for accurately determining human emotional states. This paper proposes an end-to-end multimodal emotion recognition method based on facial expressions and non-contact physiological signals. Facial expression features and remote photoplethysmography (rPPG) signals are extracted from facial video data, and a transformer-based cross-modal attention mechanism (TCMA) is used to learn the correlation between the two modalities. The results show that the accuracy of emotion recognition can be slightly improved by combining facial expressions with accurate rPPG signals. The performance is further improved with the use of TCMA, for which the binary classification accuracy of valence and arousal is 91.11% and 90.00%, respectively. Additionally, when experiments are conducted using the whole dataset, an increased accuracy of 7.31% and 4.23% for the binary classification of valence and arousal, and an improved accuracy of 5.36% for the four classifications of valence-arousal are achieved when TCMA is used in modal fusion, compared to using only facial expression modality, which fully demonstrates the effectiveness and robustness of TCMA. This method makes it possible to realize multimodal emotion recognition of facial expressions and contactless physiological signals in reality.
科研通智能强力驱动
Strongly Powered by AbleSci AI