计算机科学
模态(人机交互)
情绪分析
人工智能
一般化
边距(机器学习)
代表(政治)
情态动词
样品(材料)
机器学习
班级(哲学)
领域(数学)
特征学习
自然语言处理
数学
数学分析
政治
化学
高分子化学
法学
纯数学
色谱法
政治学
作者
Sijie Mai,Ying Zeng,Shuangjia Zheng,Haifeng Hu
标识
DOI:10.1109/taffc.2022.3172360
摘要
The wide application of smart devices enables the availability of multimodal data, which can be utilized in many tasks. In the field of multimodal sentiment analysis, most previous works focus on exploring intra- and inter-modal interactions. However, training a network with cross-modal information (language, audio and visual) is still challenging due to the modality gap. Besides, while learning dynamics within each sample draws great attention, the learning of inter-sample and inter-class relationships is neglected. Moreover, the size of datasets limits the generalization ability of the models. To address the afore-mentioned issues, we propose a novel framework HyCon for hybrid contrastive learning of tri-modal representation. Specifically, we simultaneously perform intra-/inter-modal contrastive learning and semi-contrastive learning, with which the model can fully explore cross-modal interactions, learn inter-sample and inter-class relationships, and reduce the modality gap. Besides, refinement term and modality margin are introduced to enable a better learning of unimodal pairs. Moreover, we devise pair selection mechanism to identify and assign weights to the informative negative and positive pairs. HyCon can naturally generate many training pairs for better generalization and reduce the negative effect of limited datasets. Extensive experiments demonstrate that our method outperforms baselines on multimodal sentiment analysis and emotion recognition.
科研通智能强力驱动
Strongly Powered by AbleSci AI