计算机科学
人工智能
图形
卷积神经网络
自然语言处理
深度学习
情绪分析
注意力网络
机器学习
模式识别(心理学)
语义学(计算机科学)
作者
Jian Huang,Zehang Lin,Zhenguo Yang,Wenyin Liu
出处
期刊:International Conference on Multimodal Interfaces
日期:2021-10-18
卷期号:: 239-247
标识
DOI:10.1145/3462244.3479939
摘要
In this paper, we propose a temporal graph convolutional network (TGCN) to recognize the sentiments from language (textual), acoustic, and visual (facial expressions) modalities. TGCN constructs a modality-specific graph whose nodes are the aligned segments in the multimodal utterances and edges are weighted according to the distances between their features, in order to learn node embeddings with sequential semantics underlying the utterances. In particular, we use positional encoding by interleaving sine and cosine embedding to encode the positions of the segments in the utterances into their features. Given the modality-specific embeddings of the segments in utterances, we create an attention mechanism corresponding to the segments to capture the sentiment-related ones and obtain the unified embeddings of utterances. Furthermore, we fuse the attended embeddings of the multimodel utterances and conduct the attention to capture their interaction. Finally, the fused embeddings together with their raw features are concatenated together for sentiment predictions. Extensive experiments on three publicly available datasets show that TGCN outperforms the state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI