计算机科学
自然语言处理
人工智能
人机交互
认知科学
心理学
作者
Rui Wang,Chaopeng Guo,Mohammad Shabaz,Imad Rida,Erik Cambria,Xianxun Zhu
标识
DOI:10.1109/tcss.2025.3572495
摘要
Multimodal emotion analysis is pivotal in decoding complex human affect by integrating diverse data sources such as text, audio, and visual signals. In this article, we introduce contextual interaction-based multimodal emotion analysis with enhanced semantic information (CIME), a novel spatio-temporal interaction network that significantly improves emotion recognition accuracy and robustness. CIME employs a text-centric cross-modal attention mechanism to refine semantic representations, while simultaneously leveraging a graph convolutional network to model contextual dialog information by capturing both intraspeaker and interspeaker relationships. This dual approach enables the effective fusion of modality-specific cues and the mining of latent emotional associations across modalities. Extensive experiments conducted on benchmark datasets—including IEMOCAP and MOSEI—demonstrate that CIME consistently outperforms existing state-of-the-art methods in terms of overall classification accuracy and weighted F1-scores. Furthermore, detailed ablation studies underscore the critical contributions of both the cross-modal attention and graph-based contextual modules.
科研通智能强力驱动
Strongly Powered by AbleSci AI