已入深夜,您辛苦了!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!祝你早点完成任务,早点休息,好梦!

Knowing What it is: Semantic-Enhanced Dual Attention Transformer

计算机科学 隐藏字幕 答疑 人工智能 自然语言处理 情报检索 可视化 图像(数学)
作者
Yiwei Ma,Jiayi Ji,Xiaoshuai Sun,Yiyi Zhou,Yongjian Wu,Feiyue Huang,Rongrong Ji
出处
期刊:IEEE Transactions on Multimedia [Institute of Electrical and Electronics Engineers]
卷期号:25: 3723-3736 被引量:26
标识
DOI:10.1109/tmm.2022.3164787
摘要

Attention has become an indispensable component of the models of various multimedia tasks like Image Captioning (IC) and Visual Question Answering (VQA). However, most existing attention modules are designed for capturing the spatial dependency, and are still insufficient in semantic understanding, e.g. , the categories of objects and their attributes, which is also critical for image captioning. To compensate for this defect, we propose a novel attention module termed Channel-wise Attention Block (CAB) to model channel-wise dependency for both visual modality and linguistic modality, thereby improving semantic learning and multi-modal reasoning simultaneously. Specifically, CAB has two novel designs to tackle with the high overhead of channel-wise attention, which are the reduction-reconstruction block structure and the gating-based attention prediction . Based on CAB, we further propose a novel Semantic-enhanced Dual Attention Transformer (termed SDATR), which combines the merits of spatial and channel-wise attentions. To validate SDATR, we conduct extensive experiments on the MS COCO dataset and yield new state-of-the-art performance of 134.5 CIDEr score on COCO Karpathy test split and 136.0 CIDEr score on the official online testing server. To examine the generalization of SDATR, we also apply it to the task of visual question answering, where superior performance gains are also witnessed. The code and models are publicly available at https://github.com/xmu-xiaoma666/SDATR .
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
2秒前
3秒前
星辰大海应助bswxy采纳,获得10
5秒前
夕夕成玦完成签到 ,获得积分10
5秒前
sunshine发布了新的文献求助10
7秒前
7秒前
7秒前
zhangpeng完成签到,获得积分10
7秒前
feng完成签到,获得积分10
8秒前
8秒前
FBSoos发布了新的文献求助10
8秒前
11秒前
桐桐应助紧张的毛衣采纳,获得10
11秒前
11秒前
11秒前
乐乐应助yl采纳,获得10
12秒前
66完成签到 ,获得积分10
12秒前
13秒前
胡杨柳发布了新的文献求助10
13秒前
科研华完成签到,获得积分10
13秒前
15秒前
Galaxy8发布了新的文献求助10
15秒前
16秒前
16秒前
jovrtic发布了新的文献求助10
18秒前
野子发布了新的文献求助10
19秒前
研友_VZG7GZ应助科研通管家采纳,获得10
20秒前
星辰大海应助_panacea采纳,获得10
20秒前
无花果应助科研通管家采纳,获得10
20秒前
所所应助科研通管家采纳,获得10
20秒前
ccm应助科研通管家采纳,获得10
20秒前
ding应助科研通管家采纳,获得10
21秒前
爆米花应助科研通管家采纳,获得30
21秒前
21秒前
21秒前
panda发布了新的文献求助10
21秒前
科目三应助Folium采纳,获得10
21秒前
bswxy发布了新的文献求助10
23秒前
24秒前
jovrtic完成签到,获得积分10
26秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Encyclopedia of Reproduction Third Edition 3000
Comprehensive Methanol Science Production, Applications, and Emerging Technologies 2000
化妆品原料学 1000
1st Edition Sports Rehabilitation and Training Multidisciplinary Perspectives By Richard Moss, Adam Gledhill 600
小学科学课程与教学 500
Study and Interlaboratory Validation of Simultaneous LC-MS/MS Method for Food Allergens Using Model Processed Foods 500
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 计算机科学 有机化学 物理 生物化学 纳米技术 复合材料 内科学 化学工程 人工智能 催化作用 遗传学 数学 基因 量子力学 物理化学
热门帖子
关注 科研通微信公众号,转发送积分 5644177
求助须知:如何正确求助?哪些是违规求助? 4763055
关于积分的说明 15023932
捐赠科研通 4802413
什么是DOI,文献DOI怎么找? 2567430
邀请新用户注册赠送积分活动 1525174
关于科研通互助平台的介绍 1484663