An Effective Video Transformer With Synchronized Spatiotemporal and Spatial Self-Attention for Action Recognition

计算机科学 人工智能 计算机视觉 变压器 编码器 卷积神经网络 模式识别(心理学) 工程类 操作系统 电气工程 电压
作者
Saghir Alfasly,Charles K. Chui,Qingtang Jiang,Jian Lü,Xu Chen
出处
期刊:IEEE transactions on neural networks and learning systems [Institute of Electrical and Electronics Engineers]
卷期号:35 (2): 2496-2509 被引量:37
标识
DOI:10.1109/tnnls.2022.3190367
摘要

Convolutional neural networks (CNNs) have come to dominate vision-based deep neural network structures in both image and video models over the past decade. However, convolution-free vision Transformers (ViTs) have recently outperformed CNN-based models in image recognition. Despite this progress, building and designing video Transformers have not yet obtained the same attention in research as image-based Transformers. While there have been attempts to build video Transformers by adapting image-based Transformers for video understanding, these Transformers still lack efficiency due to the large gap between CNN-based models and Transformers regarding the number of parameters and the training settings. In this work, we propose three techniques to improve video understanding with video Transformers. First, to derive better spatiotemporal feature representation, we propose a new spatiotemporal attention scheme, termed synchronized spatiotemporal and spatial attention (SSTSA), which derives the spatiotemporal features with temporal and spatial multiheaded self-attention (MSA) modules. It also preserves the best spatial attention by another spatial self-attention module in parallel, thereby resulting in an effective Transformer encoder. Second, a motion spotlighting module is proposed to embed the short-term motion of the consecutive input frames to the regular RGB input, which is then processed with a single-stream video Transformer. Third, a simple intraclass frame interlacing method of the input clips is proposed that serves as an effective video augmentation method. Finally, our proposed techniques have been evaluated and validated with a set of extensive experiments in this study. Our video Transformer outperforms its previous counterparts on two well-known datasets, Kinetics400 and Something-Something-v2.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
pan发布了新的文献求助20
1秒前
首页发布了新的文献求助10
2秒前
2秒前
2秒前
嘿嘿发布了新的文献求助10
3秒前
3秒前
3秒前
研友_VZG7GZ应助Hyy采纳,获得10
4秒前
一杯半茶发布了新的文献求助10
4秒前
4秒前
科目三应助科研通管家采纳,获得10
5秒前
雨中小王应助科研通管家采纳,获得10
5秒前
领导范儿应助科研通管家采纳,获得10
5秒前
李爱国应助科研通管家采纳,获得10
5秒前
科目三应助科研通管家采纳,获得10
5秒前
隐形曼青应助科研通管家采纳,获得10
5秒前
Jasper应助科研通管家采纳,获得10
5秒前
科研通AI6应助科研通管家采纳,获得10
5秒前
小二郎应助科研通管家采纳,获得30
5秒前
科研通AI6应助科研通管家采纳,获得10
5秒前
英俊的铭应助科研通管家采纳,获得10
5秒前
科研通AI6应助科研通管家采纳,获得10
5秒前
5秒前
JamesPei应助科研通管家采纳,获得10
5秒前
打打应助科研通管家采纳,获得10
5秒前
Orange应助科研通管家采纳,获得10
5秒前
科研通AI6应助科研通管家采纳,获得10
5秒前
侯总应助科研通管家采纳,获得10
5秒前
6秒前
愉快南琴发布了新的文献求助10
6秒前
李健应助清爽的芷蕾采纳,获得10
7秒前
7秒前
8秒前
吴林丹发布了新的文献求助10
9秒前
123321完成签到,获得积分20
9秒前
邓佳鑫Alan应助17853723535采纳,获得10
10秒前
果果发布了新的文献求助10
10秒前
汉堡包应助wuqi采纳,获得10
11秒前
maolin发布了新的文献求助10
11秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Encyclopedia of Agriculture and Food Systems Third Edition 2000
Clinical Microbiology Procedures Handbook, Multi-Volume, 5th Edition 临床微生物学程序手册,多卷,第5版 2000
Les Mantodea de Guyane: Insecta, Polyneoptera [The Mantids of French Guiana] | NHBS Field Guides & Natural History 1500
The Victim–Offender Overlap During the Global Pandemic: A Comparative Study Across Western and Non-Western Countries 1000
King Tyrant 720
T/CIET 1631—2025《构网型柔性直流输电技术应用指南》 500
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 计算机科学 有机化学 物理 生物化学 纳米技术 复合材料 内科学 化学工程 人工智能 催化作用 遗传学 数学 基因 量子力学 物理化学
热门帖子
关注 科研通微信公众号,转发送积分 5594267
求助须知:如何正确求助?哪些是违规求助? 4679962
关于积分的说明 14812493
捐赠科研通 4646674
什么是DOI,文献DOI怎么找? 2534851
邀请新用户注册赠送积分活动 1502831
关于科研通互助平台的介绍 1469497