情态动词
平衡(能力)
多样性(政治)
情绪分析
多模态
模式治疗法
计算机科学
心理学
语言学
认知心理学
人工智能
社会学
哲学
人类学
神经科学
化学
高分子化学
心理治疗师
作者
Meng Li,Zhenfang Zhu,Kefeng Li,Hongli Pei
标识
DOI:10.1109/taffc.2024.3430045
摘要
Multimodal Sentiment Analysis (MSA) is the technology of intelligently recognizing and assessing human sentiments using various data forms such as text, image, and audio. Despite current mainstream methods have made significant progress, MSA still faces the following issues: 1) most current methods train models based on pre-extracted features, lacking a sufficient understanding of sentiment diversity in multimodal data and may even lead to the loss of critical information in the raw data; and 2) textual modality, which possesses high-level semantic features, should typically dominate the fusion process, yet current methods fail to fully leverage this characteristic to balance modality information. To address the aforementioned issues, we propose a novel Multimodal Sentiment Analysis framework using Multimodal-Prefixed and Cross-Modal Attention (DB-MPCA). For the first issue, DB-MPCA employs multimodal raw data for pre-training, which not only allows for in-depth exploration of multimodal information but also significantly enhances the model’s learning capabilities and generalization, while reducing the substantial costs associated with manual annotation. Regarding the second issue, DB-MPCA introduces two prefix encoders designed to convert acoustic and visual features into prefix tokens. These tokens are then embedded into a pre-trained language model, where they are encoded together with textual tokens. Through this approach, DB-MPCA effectively learns cross-modal attention while maintaining the dominance of the textual modality, thereby optimizing the fusion of modalities. Comprehensive experiments conducted on the widely utilized dataset (CMU-MOSI) demonstrate the effectiveness of our model, highlighting its superiority over baseline models.
科研通智能强力驱动
Strongly Powered by AbleSci AI