Learning Complementary Spatial–Temporal Transformer for Video Salient Object Detection

计算机科学 人工智能 计算机视觉 变压器 突出 基于对象 视频跟踪 对象(语法) 工程类 电气工程 电压
作者
Nian Liu,Kepan Nan,Wangbo Zhao,Xiwen Yao,Junwei Han
出处
期刊:IEEE transactions on neural networks and learning systems [Institute of Electrical and Electronics Engineers]
卷期号:35 (8): 10663-10673 被引量:19
标识
DOI:10.1109/tnnls.2023.3243246
摘要

Besides combining appearance and motion information, another crucial factor for video salient object detection (VSOD) is to mine spatial–temporal (ST) knowledge, including complementary long–short temporal cues and global–local spatial context from neighboring frames. However, the existing methods only explored part of them and ignored their complementarity. In this article, we propose a novel complementary ST transformer (CoSTFormer) for VSOD, which has a short-global branch and a long-local branch to aggregate complementary ST contexts. The former integrates the global context from the neighboring two frames using dense pairwise attention, while the latter is designed to fuse long-term temporal information from more consecutive frames with local attention windows. In this way, we decompose the ST context into a short-global part and a long-local part and leverage the powerful transformer to model the context relationship and learn their complementarity. To solve the contradiction between local window attention and object motion, we propose a novel flow-guided window attention (FGWA) mechanism to align the attention windows with object and camera movements. Furthermore, we deploy CoSTFormer on fused appearance and motion features, thus enabling the effective combination of all three VSOD factors. Besides, we present a pseudo video generation method to synthesize sufficient video clips from static images for training ST saliency models. Extensive experiments have verified the effectiveness of our method and illustrated that we achieve new state-of-the-art results on several benchmark datasets.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
bjut完成签到,获得积分10
刚刚
onlooker关注了科研通微信公众号
刚刚
每天我都睡得好完成签到 ,获得积分10
1秒前
Darlene发布了新的文献求助10
1秒前
思源应助跳跃的摩托采纳,获得10
1秒前
科研通AI2S应助LIJIngcan采纳,获得10
2秒前
AEROU完成签到 ,获得积分10
3秒前
3秒前
朴实小夏完成签到,获得积分10
5秒前
认真柠檬发布了新的文献求助10
5秒前
5秒前
6秒前
科研通AI5应助西西采纳,获得10
6秒前
7秒前
Hiker发布了新的文献求助10
7秒前
blueblue不熬夜完成签到,获得积分10
8秒前
秋刀鱼完成签到,获得积分10
8秒前
Math4396发布了新的文献求助10
8秒前
Colin发布了新的文献求助10
9秒前
小眼儿完成签到 ,获得积分10
10秒前
11秒前
华仔应助朴实小夏采纳,获得10
12秒前
小林完成签到 ,获得积分10
12秒前
Estella完成签到,获得积分10
13秒前
14秒前
17秒前
leeshho完成签到,获得积分10
17秒前
沐沐发布了新的文献求助10
17秒前
CodeCraft应助滴滴哒采纳,获得10
18秒前
失眠的蓝完成签到,获得积分10
18秒前
哥谭下小雪完成签到,获得积分10
18秒前
18秒前
科研通AI5应助阔达莫茗采纳,获得30
21秒前
科研通AI2S应助皮皮采纳,获得30
22秒前
脑洞疼应助大力的凝琴采纳,获得10
22秒前
万能图书馆应助Sun采纳,获得10
23秒前
onlooker完成签到,获得积分10
26秒前
27秒前
28秒前
高分求助中
Technologies supporting mass customization of apparel: A pilot project 600
Introduction to Strong Mixing Conditions Volumes 1-3 500
China—Art—Modernity: A Critical Introduction to Chinese Visual Expression from the Beginning of the Twentieth Century to the Present Day 430
Tip60 complex regulates eggshell formation and oviposition in the white-backed planthopper, providing effective targets for pest control 400
A Field Guide to the Amphibians and Reptiles of Madagascar - Frank Glaw and Miguel Vences - 3rd Edition 400
China Gadabouts: New Frontiers of Humanitarian Nursing, 1941–51 400
The Healthy Socialist Life in Maoist China, 1949–1980 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3797740
求助须知:如何正确求助?哪些是违规求助? 3343209
关于积分的说明 10314887
捐赠科研通 3059968
什么是DOI,文献DOI怎么找? 1679185
邀请新用户注册赠送积分活动 806411
科研通“疑难数据库(出版商)”最低求助积分说明 763150