计算机科学
编码器
图像(数学)
人工智能
保险丝(电气)
光学(聚焦)
变压器
计算机视觉
模式识别(心理学)
量子力学
操作系统
电气工程
光学
物理
工程类
电压
作者
Xingwang Xiao,Yuanyuan Pu,Zhengpeng Zhao,Jinjing Gu,Dan Xu
标识
DOI:10.1109/ijcnn54540.2023.10191445
摘要
Exploring the interaction between image and text has a great strength for image-text sentiment analysis. However, most methods only focus on learning forward interaction in forward image-text features and fail to capture the backward interaction in backward image-text features, which leads to the loss of necessary information embedded in backward interaction. In this paper, Bidirectional Interaction Transformer (BIT) that models both forward and backward image-text interactions is proposed for image-text sentiment analysis. Specifically, we first encode image and text to forward and backward features. Then, these features are fed into Bidirectional Interaction Encoder (BIE) with Forward Interaction and Back Interaction branches to model bidirectional (i.e., forward and backward) image-text interaction. Finally, Two-scale Adaptive Gating Fusion (TAGF) is designed to adaptively fuse the forward and backward interactions learned by BIE. Extensive experiments conducted on two public datasets demonstrate the effectiveness of the proposed model.
科研通智能强力驱动
Strongly Powered by AbleSci AI