计算机科学
人工智能
情绪分析
模式识别(心理学)
自然语言处理
作者
Jia Li,Tinghuai Ma,Huan Rong,Victor S. Sheng,Xuejian Huang,Xintong Xie
出处
期刊:IEEE transactions on artificial intelligence
[Institute of Electrical and Electronics Engineers]
日期:2023-01-01
卷期号:: 1-11
标识
DOI:10.1109/tai.2023.3341879
摘要
With the development of fine-grained multimodal sentiment analysis tasks, target-oriented multimodal sentiment analysis has received more attention, which aims to classify the sentiment of target with the help of textual and associated image features. Existing methods focus on exploring fine-grained image features and incorporate transformer-based complex fusion strategies, while ignoring the heavy computational burden. Recently, some lightweight MLP-based methods have been successfully applied to multimodal sentiment classification tasks. In this paper, we propose an effective rearrangement and restore mixer model (RR-Mixer) for target-oriented multimodal sentiment classification (TMSC), which dedicates the interaction of image, text, and targets along the modal-axis , sequential-axis , and feature channel-axis through rearrangement and restore operations. Specifically, we take Vision Transformer (ViT) and Robustly optimized BERT (RoBERTa) pre-trained models to extract image and textual features respectively. Further, we adopt cosine similarity to select the most semantically relevant image features. Then, an RR-Mixer Module is designed for mixed multimodal features, with the core technology consisting of rolling, grouping rearrangement and restore operations. Moreover, we introduce MLP Unit to learn the information of different modalities for inter-modal interaction. The results show that our model achieves superior performance on two benchmark multimodal datasets, TWITTER-15 and TWITTER-17, with a significant improvement of 4.66%, 1.26% in terms of macro-F1.
科研通智能强力驱动
Strongly Powered by AbleSci AI