计算机科学
任务(项目管理)
模态(人机交互)
模式
人工智能
情绪分析
多模式学习
自然语言处理
代表(政治)
机器学习
深度学习
社会科学
管理
社会学
政治
政治学
法学
经济
作者
Yi Luo,Rui Wu,Jiafeng Liu,Xianglong Tang
标识
DOI:10.1016/j.neucom.2023.126836
摘要
Multimodal Sentiment Analysis (MSA) is an active area of research that leverages multimodal signals for affective understanding of user-generated videos. Existing research tends to develop sophisticated fusion techniques to fuse unimodal representations into multimodal representation and treat MSA as a single prediction task. However, we find that the text modality with the pre-trained model (BERT) learn more semantic information and dominates the training in multimodal models, inhibiting the learning of other modalities. Besides, the classification ability of each modality is also suppressed by single-task learning. In this paper, We propose a text guided multi-task learning network to enhance the semantic information of non-text modalities and improve the learning ability of unimodal networks. We conducted experiments on multimodal sentiment analysis datasets, CMU-MOSI, CMU-MOSEI, and CH-SIMS. The results show that our method outperforms the current SOTA method.
科研通智能强力驱动
Strongly Powered by AbleSci AI