计算机科学
人工智能
情绪分析
保险丝(电气)
集合(抽象数据类型)
特征(语言学)
自然语言处理
编码器
可视化
模式识别(心理学)
语言学
哲学
电气工程
程序设计语言
工程类
操作系统
作者
Junyu Chen,Jing An,Hanjia Lyu,Jiebo Luo
出处
期刊:Cornell University - arXiv
日期:2022-01-01
标识
DOI:10.48550/arxiv.2211.12981
摘要
Visual-textual sentiment analysis aims to predict sentiment with the input of a pair of image and text. The main challenge of visual-textual sentiment analysis is how to learn effective visual features for sentiment prediction since input images are often very diverse. To address this challenge, we propose a new method that improves visual-textual sentiment analysis by introducing powerful expert visual features. The proposed method consists of four parts: (1) a visual-textual branch to learn features directly from data for sentiment analysis, (2) a visual expert branch with a set of pre-trained "expert" encoders to extract effective visual features, (3) a CLIP branch to implicitly model visual-textual correspondence, and (4) a multimodal feature fusion network based on either BERT or MLP to fuse multimodal features and make sentiment prediction. Extensive experiments on three datasets show that our method produces better visual-textual sentiment analysis performance than existing methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI