计算机科学
情态动词
情绪分析
判决
模态(人机交互)
人工智能
自然语言处理
依赖关系(UML)
宏
变压器
图形
表达式(计算机科学)
保险丝(电气)
理论计算机科学
电气工程
物理
工程类
电压
化学
高分子化学
程序设计语言
量子力学
作者
Luwei Xiao,Xingjiao Wu,Shuwen Yang,Junjie Xu,Jie Zhou,Liang He
标识
DOI:10.1016/j.ipm.2023.103508
摘要
Multi-modal Aspect-based Sentiment Analysis (MABSA) aims to forecast the polarity of sentiment concerning aspects within a given sentence based on the correlation between the sentence and its accompanying image. Comprehending multi-modal sentiment expression requires strong cross-modal alignment and fusion ability. Previous state-of-the-art (SOTA) models fail to explicitly align valuable visual clues with aspect and sentiment information in textual representations and overlook the utilization of syntactic dependency information in the accompanying text modality. We present CoolNet (Cross-modal Fine-grained Alignment and Fusion Network) to boost the performance of visual-language models in seamlessly integrating vision and language information. Specifically, CoolNet starts by transforming an image into a textual caption and a graph structure, then dynamically aligns the semantic and syntactic information from both the input sentence and the generated caption, as well as models the object-level visual features. Finally, a cross-modal transformer is employed to fuse and model the inter-modality dynamics.This network boasts advanced cross-modal fine-grained alignment and fusion capabilities. On standard benchmarks such as Twitter-2015 and Twitter-2017, CoolNet consistently outperforms state-of-the-art algorithm FITE with notable improvements in accuracy and Macro-F1 values. Specifically, we observe an improvement in accuracy and Macro-F1 values by 1.43% and 1.38% for Twitter-2015, and 0.74% and 0.88% for Twitter-2017, respectively, demonstrating the superiority of our CoolNet architecture.
科研通智能强力驱动
Strongly Powered by AbleSci AI