情绪分析
计算机科学
自然语言处理
人工智能
数据科学
作者
Wenbin Wang,Liang Ding,Li Shen,Yong Luo,Han Hu,Dacheng Tao
出处
期刊:Cornell University - arXiv
日期:2024-01-01
标识
DOI:10.48550/arxiv.2401.06659
摘要
Sentiment analysis is rapidly advancing by utilizing various data modalities (e.g., text, image). However, most previous works relied on superficial information, neglecting the incorporation of contextual world knowledge (e.g., background information derived from but beyond the given image and text pairs) and thereby restricting their ability to achieve better multimodal sentiment analysis (MSA). In this paper, we proposed a plug-in framework named WisdoM, to leverage the contextual world knowledge induced from the large vision-language models (LVLMs) for enhanced MSA. WisdoM utilizes LVLMs to comprehensively analyze both images and corresponding texts, simultaneously generating pertinent context. To reduce the noise in the context, we also introduce a training-free contextual fusion mechanism. Experiments across diverse granularities of MSA tasks consistently demonstrate that our approach has substantial improvements (brings an average +1.96% F1 score among five advanced methods) over several state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI