对话
计算机科学
人工神经网络
蒸馏
人工智能
图形
语音识别
自然语言处理
模式识别(心理学)
理论计算机科学
心理学
沟通
化学
有机化学
作者
Yijing Dai,Yingjian Li,Dongpeng Chen,Jinxing Li,Guangming Lu
标识
DOI:10.1109/tcsvt.2024.3405406
摘要
Graph Neural Networks (GNNs) have attracted increasing attentions for multimodal Emotion Recognition in Conversation (ERC) due to their good performance in contextual understanding. However, most existing GNN-based methods suffer from two challenges: 1) How to explore and propagate appropriate information in a conversational graph. Typical GNNs in ERC neglect to mine the emotion commonality and discrepancy in the local neighborhood, leading to learn similar embbedings for connected nodes. However, the embeddings of these connected nodes are supposed to be distinguishable as they belong to different speakers with different emotions. 2) Most existing works apply simple concatenation or co-occurrence prior for modality combination, failing to fully capture the emotional information of multiple modalities in relationship modeling. In this paper, we propose a multimodal Decoupled Distillation Graph Neural Network (D 2 GNN) to address the above challenges. Specifically, D 2 GNN decouples the input features into emotion-aware and emotion-agnostic ones on the emotion category-level, aiming to capture emotion commonality and implicit emotion information, respectively. Moreover, we design a new message passing mechanism to separately propagate emotion-aware and -agnostic knowledge between nodes according to speaker dependency in two GNN-based modules, exploring the correlations of utterances and alleviating the similarities of embeddings. Furthermore, a multimodal distillation unit is performed to obtain the distinguishable embeddings by aggregating unimodal decoupled features. Experimental results on two ERC benchmarks demonstrate the superiority of the proposed model. Code is available at https://github.com/gityider/D2GNN.
科研通智能强力驱动
Strongly Powered by AbleSci AI