情绪分析
对比分析
计算机科学
人工智能
心理学
自然语言处理
认知心理学
语言学
哲学
作者
Cunhang Fan,Kang Zhu,Jianhua Tao,Guofeng Yi,Jun Xue,Zhao Lv
标识
DOI:10.1109/taffc.2024.3423671
摘要
Recently, multimodal fusion efforts have achieved remarkable success in Multimodal Sentiment Analysis (MSA). However, most of the existing methods are based on model-level fusion, and the challenge of heterogeneity between modalities is not well resolved. Heterogeneity lies in the different feature distributions and distinct representation spaces among different modalities. To mitigate this problem, we propose that fusion is a progressive process, and we introduce a novel multi-level contrastive learning and multi-layer convolution fusion (MCL-MCF) method for MSA. Due to the relationships among multimodal data, the fusion process that involves single-modal to single-modal, single-modal to bimodal or trimodal, and higher-level fused modality semantic consistency is divided into three levels. The first-level contrast learning alleviates heterogeneity between unimodal modalities at the early level of multimodal feature fusion. The second-level contrast learning mitigates heterogeneity between unimodal and fused modalities. At the third level, we introduce a tensor convolution fusion (TCF) module that extracts high-level semantic features from the fused modalities and mitigates heterogeneity at the higher feature level through contrastive learning. To simulate fusion as a progressive process, MCF is proposed to fuse shallow and deep features to model complex relationships among modalities. Experiments on three public datasets show our approach's state-of-the-art performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI