判别式
话语
关系(数据库)
计算机科学
情绪分析
特征(语言学)
代表(政治)
人工智能
自然语言处理
模式
模态(人机交互)
特征学习
光学(聚焦)
班级(哲学)
词(群论)
模式识别(心理学)
数据挖掘
语言学
物理
哲学
社会学
光学
法学
政治
社会科学
政治学
作者
Zemin Tang,Xiaoxue Li,Chao Wang,Yangfan Li,Cen Chen,Kenli Li
标识
DOI:10.1016/j.ins.2023.119125
摘要
Modality representation learning is a critical issue in multimodal sentiment analysis (MSA). A good sentiment representation should contain as much effective information as possible while being discriminative enough to be better recognized. Previous attention-based MSA methods mainly rely on word-level feature interactions to capture intra-modality and inter-modality relations, which may lead to the loss of essential sentiment information. Furthermore, they primarily focus on information fusion but do not give enough importance to feature discrimination. To address these challenges, we propose a modal-utterance-temporal attention network with multimodal sentiment loss (MUTA-Net) for learning discriminative multi-relation representations, where the modal-utterance-temporal attention (MUTA) and multimodal sentiment loss (MMSL) are the two core units. First, we propose MUTA incorporate utterance-level feature vectors into the interactions of different modalities, which can help extract more useful relationships as the utterance-level vectors may contain sentiment information complementary to word-level vectors. Second, MMSL is designed to achieve a large inter-class distance and a small intra-class feature distance simultaneously in multimodal scenes to enhance the discriminative power of feature representations. Our experiments on four public multimodal datasets show that MUTA-Net outperforms previous baselines significantly.
科研通智能强力驱动
Strongly Powered by AbleSci AI