Dual self-attention with co-attention networks for visual question answering

计算机科学 答疑 人工智能 判决 对偶(语法数字) 卷积神经网络 任务(项目管理) 词(群论) 注意力网络 模式识别(心理学) 特征(语言学) 自然语言处理 机器学习 经济 管理 艺术 文学类 语言学 哲学
作者
Yun Liu,Xiaoming Zhang,Qianyun Zhang,Chaozhuo Li,Feiran Huang,Xianghong Tang,Zhoujun Li
出处
期刊:Pattern Recognition [Elsevier BV]
卷期号:117: 107956-107956 被引量:52
标识
DOI:10.1016/j.patcog.2021.107956
摘要

Visual Question Answering (VQA) as an important task in understanding vision and language has been proposed and aroused wide interests. In previous VQA methods, Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) are generally used to extract visual and textual features respectively, and then the correlation between these two features is explored to infer the answer. However, CNN mainly focuses on extracting local spatial information and RNN pays more attention on exploiting sequential architecture and long-range dependencies. It is difficult for them to integrate the local features with their global dependencies to learn more effective representations of the image and question. To address this problem, we propose a novel model, i.e., Dual Self-Attention with Co-Attention networks (DSACA), for VQA. It aims to model the internal dependencies of both the spatial and sequential structure respectively by using the newly proposed self-attention mechanism. Specifically, DSACA mainly contains three submodules. The visual self-attention module selectively aggregates the visual features at each region by a weighted sum of the features at all positions. The textual self-attention module automatically emphasizes the interdependent word features by integrating associated features among the sentence words. Besides, the visual-textual co-attention module explores the close correlation between visual and textual features learned from self-attention modules. The three modules are integrated into an end-to-end framework to infer the answer. Extensive experiments performed on three generally used VQA datasets confirm the favorable performance of DSACA compared with state-of-the-art methods.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
1秒前
m李完成签到 ,获得积分10
1秒前
Akim应助全没了采纳,获得10
1秒前
2秒前
3秒前
3秒前
4秒前
4秒前
游阿游发布了新的文献求助10
5秒前
Zhangyaocpusioc完成签到,获得积分10
5秒前
5秒前
香蕉觅云应助Alane采纳,获得10
5秒前
Lio发布了新的文献求助10
5秒前
科研通AI6.1应助雨打麻花采纳,获得10
5秒前
乐观生活发布了新的文献求助10
6秒前
9秒前
Aaron发布了新的文献求助10
11秒前
程程程哇完成签到,获得积分10
11秒前
11秒前
大陆发布了新的文献求助10
12秒前
12秒前
NexusExplorer应助羞涩的孙采纳,获得10
12秒前
彪壮的向松完成签到,获得积分10
12秒前
烤冷面发布了新的文献求助200
12秒前
溪泉发布了新的文献求助10
12秒前
13秒前
AllRightReserved应助简单采纳,获得10
13秒前
13秒前
平淡安阳完成签到,获得积分20
13秒前
14秒前
jinyu发布了新的文献求助10
14秒前
14秒前
14秒前
14秒前
豆十年完成签到,获得积分10
15秒前
伶俐香岚完成签到,获得积分10
15秒前
李爱国应助读研小白采纳,获得10
16秒前
英俊的铭应助火星冬采纳,获得200
16秒前
zyt完成签到,获得积分10
16秒前
高分求助中
Adhesion Science: Principles & Practice 1234
Signals, Systems, and Signal Processing 610
Petrology and Plate Tectonics,2025 400
Burger's Medicinal Chemistry and Drug Discovery 400
New directions for experimental lessons in science teaching: Myth, Mystery, Necessity? by Emily K. da Silva Cunha Souto (Author), Flávia Lins Silva (Author) 333
Scientific experimentation in the classroom: Comparison between genetic-Socratic-exemplary teaching and workshop teaching by Ingrid Hofer (Author) 333
Programming for Chemical Engineers Using C, C++, and MATLAB 320
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6722174
求助须知:如何正确求助?哪些是违规求助? 8458359
关于积分的说明 18058103
捐赠科研通 5974852
什么是DOI,文献DOI怎么找? 2996637
邀请新用户注册赠送积分活动 1972725
关于科研通互助平台的介绍 1926781