模式
计算机科学
超图
语义学(计算机科学)
代表(政治)
人工智能
匹配(统计)
模态(人机交互)
理论计算机科学
空格(标点符号)
语义匹配
自然语言处理
数学
社会科学
统计
离散数学
社会学
政治
政治学
法学
程序设计语言
操作系统
作者
Eun‐Sol Kim,Woo Young Kang,Kyoung-Woon On,Yu‐Jung Heo,Byoung‐Tak Zhang
标识
DOI:10.1109/cvpr42600.2020.01459
摘要
One of the fundamental problems that arise in multimodal learning tasks is the disparity of information levels between different modalities. To resolve this problem, we propose Hypergraph Attention Networks (HANs), which define a common semantic space among the modalities with symbolic graphs and extract a joint representation of the modalities based on a co-attention map constructed in the semantic space. HANs follow the process: constructing the common semantic space with symbolic graphs of each modality, matching the semantics between sub-structures of the symbolic graphs, constructing co-attention maps between the graphs in the semantic space, and integrating the multimodal inputs using the co-attention maps to get the final joint representation. From the qualitative analysis with two Visual Question and Answering datasets, we discover that 1) the alignment of the information levels between the modalities is important, and 2) the symbolic graphs are very powerful ways to represent the information of the low-level signals in alignment. Moreover, HANs dramatically improve the state-of-the-art accuracy on the GQA dataset from 54.6\% to 61.88\% only using the symbolic information in quantitatively.
科研通智能强力驱动
Strongly Powered by AbleSci AI