计算机科学
人工智能
合并(版本控制)
情绪分析
稳健性(进化)
情态动词
机器学习
深度学习
自然语言处理
情报检索
生物化学
基因
化学
高分子化学
作者
Jie Mu,Feiping Nie,Wei Wang,Jian Xu,Jing Zhang,Han Liu
标识
DOI:10.1109/tkde.2023.3345022
摘要
Multimodal aspect-level sentiment analysis has attracted increasing attention in recent years. However, existing methods have two unaddressed limitations: (1) due to the lack of labelled pre-training data of dedicated sentiment analysis, the methods with a pre-training manner produce suboptimal prediction results; (2) most existing methods employ a self-attention encoder to fuse multimodal tokens, which not only ignores the alignment relationship between different modal tokens but also makes the model unable to capture the semantic links between images and texts. In this paper, we propose a momentum contrastive learning network (MOCOLNet) to overcome above limitations. First, we merge the pre-training stage with the training stage to design an end-to-end training manner which uses less labelled data dedicated to sentiment analysis to obtain better prediction results. Second, we propose a multimodal contrastive learning method to align the different modal representations before data fusing, and design a cross-modal matching strategy to provide semantic interactive information between texts and images. Moreover, we introduce an auxiliary momentum strategy to increase the robustness of model. We also analyse the effectiveness of the proposed multimodal contrastive learning method using a mutual information theory. Experiments verify that the proposed MOCOLNet is superior to other strong baselines.
科研通智能强力驱动
Strongly Powered by AbleSci AI