计算机科学
人工智能
自然语言处理
代表(政治)
水准点(测量)
情绪分析
判别式
过程(计算)
机器学习
大地测量学
政治
政治学
法学
地理
操作系统
作者
Yan Li,Xiangyuan Lan,Haifeng Chen,Ke Lü,Dongmei Jiang
摘要
Multimodal sentiment analysis aims to predict sentiments from multimodal signals such as audio, video, and text. Existing methods often rely on Pre-trained Language Models (PLMs) to extract semantic information from textual data, lacking an in-depth understanding of the logical relationships within the text modality . This paper introduces the Multimodal PEAR Chain-of-Thought (MM-PEAR-CoT) reasoning for multimodal sentiment analysis. Inspired by the human thought process when solving complex problems, the PEAR (Preliminaries, quEstion, Answer, Reason) chain-of-thought prompt is first proposed to induce Large Language Models (LLMs) to generate text-based reasoning processes and zero-shot sentiment prediction results. However, text-based chain-of-thought reasoning is not always reliable and might contain irrational steps due to the hallucinations of large language models . To address this, we further design the Cross-Modal Filtering and Fusion (CMFF) module. The filtering submodule utilizes audio and visual modalities to suppress irrational steps in the chain of thought, while the fusion submodule integrates high-level reasoning information and cross-modal complementary information in the process of semantic representation learning. Experimental results on two multimodal sentiment analysis benchmark datasets show that high-level reasoning information can help learn discriminative text representation, and cross-modal complementary information can avoid misleading by unreasonable steps in the chain of thought. MM-PEAR-CoT achieves the best results on both datasets, with improvements of 2.2% and 1.7% in binary classification accuracy on the CMU-MOSI and CMU-MOSEI datasets, respectively. To the best of our knowledge, this is the first study to apply chain-of-thought reasoning to multimodal sentiment analysis.
科研通智能强力驱动
Strongly Powered by AbleSci AI