情态动词
特征提取
计算机科学
模态(人机交互)
卷积神经网络
特征(语言学)
联营
模式
卷积(计算机科学)
人工智能
模式识别(心理学)
人工神经网络
语言学
社会学
社会科学
高分子化学
哲学
化学
作者
Ming Xu,Tuo Shi,Hao Zhang,Zeyi Liu,Xiao He
出处
期刊:IEEE transactions on artificial intelligence
[Institute of Electrical and Electronics Engineers]
日期:2025-01-01
卷期号:6 (5): 1429-1438
被引量:6
标识
DOI:10.1109/tai.2024.3523250
摘要
Recent advancements in emotion recognition research based on physiological data have been notable. However, existing multimodal methods often overlook the interrelations between various modalities, such as video and Electroencephalography data, in emotion recognition. In this paper, a feature fusion-based hierarchical cross-modal spatial fusion network is proposed that effectively integrates EEG and video features. By designing an Electroencephalography feature extraction network based on 1D convolution and a video feature extraction network based on 3D convolution, corresponding modality features are thoroughly extracted. To promote sufficient interaction between the two modalities, a hierarchical cross-modal coordinated attention module is proposed in this paper. Additionally, to enhance the network's perceptual ability for emotion-related features, a multiscale spatial pyramid pooling module is also designed. Meanwhile, a self-distillation method is introduced, which enhances the performance while reducing the number of parameters in the network. The hierarchical cross-modal spatial fusion network achieved an accuracy of 97.78% on the valence-arousal dimension of the DEAP dataset, and it also obtained an accuracy of 60.59% on the MAHNOB-HCI dataset, reaching the state-of-the-art level.
科研通智能强力驱动
Strongly Powered by AbleSci AI