Diversity and Balance: Multimodal Sentiment Analysis Using Multimodal-Prefixed and Cross-Modal Attention

情态动词平衡（能力）多样性（政治）情绪分析多模态模式治疗法计算机科学心理学语言学认知心理学人工智能社会学哲学人类学神经科学化学高分子化学心理治疗师

作者

Meng Li,Zhenfang Zhu,Kefeng Li,Hongli Pei

出处

期刊：IEEE Transactions on Affective Computing [Institute of Electrical and Electronics Engineers]
日期：2024-07-17 卷期号：16 (1): 250-263 被引量：14

标识

DOI：10.1109/taffc.2024.3430045

摘要

Multimodal Sentiment Analysis (MSA) is the technology of intelligently recognizing and assessing human sentiments using various data forms such as text, image, and audio. Despite current mainstream methods have made significant progress, MSA still faces the following issues: 1) most current methods train models based on pre-extracted features, lacking a sufficient understanding of sentiment diversity in multimodal data and may even lead to the loss of critical information in the raw data; and 2) textual modality, which possesses high-level semantic features, should typically dominate the fusion process, yet current methods fail to fully leverage this characteristic to balance modality information. To address the aforementioned issues, we propose a novel Multimodal Sentiment Analysis framework using Multimodal-Prefixed and Cross-Modal Attention (DB-MPCA). For the first issue, DB-MPCA employs multimodal raw data for pre-training, which not only allows for in-depth exploration of multimodal information but also significantly enhances the model’s learning capabilities and generalization, while reducing the substantial costs associated with manual annotation. Regarding the second issue, DB-MPCA introduces two prefix encoders designed to convert acoustic and visual features into prefix tokens. These tokens are then embedded into a pre-trained language model, where they are encoded together with textual tokens. Through this approach, DB-MPCA effectively learns cross-modal attention while maintaining the dominance of the textual modality, thereby optimizing the fusion of modalities. Comprehensive experiments conducted on the widely utilized dataset (CMU-MOSI) demonstrate the effectiveness of our model, highlighting its superiority over baseline models.

求助该文献

最长约 10秒，即可获得该文献文件

Diversity and Balance: Multimodal Sentiment Analysis Using Multimodal-Prefixed and Cross-Modal Attention

今日热心研友