情绪分析
背景(考古学)
计算机科学
人工智能
融合
自然语言处理
语言学
生物
哲学
古生物学
作者
Jiabao Li,Ruyi Liu,Qiguang Miao,Di Wang,Xiangzeng Liu
标识
DOI:10.1109/taffc.2025.3590246
摘要
Multimodal sentiment analysis (MSA) is an active research area in recent years with the exponential development of the internet and social media, which aims to recognize the speaker’s sentiment in the video consisted of text, acoustic and visual cues, and has attracted attention from many applications such as smart education, intelligent medication and social security. The predominant approaches have devoted to developing more complicated fusion strategy to learn efficient multimodal representations. However, information from these modalities usually have different contributions to MSA task. More specifically, the text modality outperforms the non-verbal modalities since its highly condensed semantic information and the maturity of the pre-trained language models. Taking full advantage of the text modality while integrating the non-verbal sentiment-relevant contextual information becomes a substantial challenge. Thus, in this paper, we propose a Context Adaptively Enhanced Text-guided Fusion Network, which is embedded in the pre-trained language model and utilizes the text modality as the guide to reduce the redundancy and exploit the sentiment-relevant information and in turn uses these information to complement itself with the non-verbal sentiment contexts. Moreover, a novelly designed non-verbal feature enhancement module is introduced to capture long-range dependencies in two directions, with the substantial removal of the redundancy and the noise. Extensive experiments on two benchmark datasets CMU-MOSI and CMU-MOSEI demonstrate the competitive performance over the state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI