计算机科学
模式
人工智能
情绪分析
语义学(计算机科学)
深度学习
代表(政治)
利用
模态(人机交互)
词(群论)
自然语言处理
机器学习
模式识别(心理学)
语言学
社会科学
计算机安全
社会学
程序设计语言
哲学
政治
政治学
法学
作者
Ashima Yadav,Dinesh Kumar Vishwakarma
摘要
Multimodal sentiment analysis has attracted increasing attention with broad application prospects. Most of the existing methods have focused on a single modality, which fails to handle social media data due to its multiple modalities. Moreover, in multimodal learning, most of the works have focused on simply combining the two modalities without exploring the complicated correlations between them. This resulted in dissatisfying performance for multimodal sentiment classification. Motivated by the status quo, we propose a Deep Multi-level Attentive network (DMLANet), which exploits the correlation between image and text modalities to improve multimodal learning. Specifically, we generate the bi-attentive visual map along the spatial and channel dimensions to magnify Convolutional neural network representation power. Then, we model the correlation between the image regions and semantics of the word by extracting the textual features related to the bi-attentive visual features by applying semantic attention. Finally, self-attention is employed to fetch the sentiment-rich multimodal features for the classification automatically. We conduct extensive evaluations on four real-world datasets, namely, MVSA-Single, MVSA-Multiple, Flickr, and Getty Images, which verify our method's superiority.
科研通智能强力驱动
Strongly Powered by AbleSci AI