已入深夜,您辛苦了!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!祝你早点完成任务,早点休息,好梦!

S4ND: Modeling Images and Videos as Multidimensional Signals Using State Spaces

计算机科学 人工智能 一般化 模式识别(心理学) 混叠 计算机视觉 数学 欠采样 数学分析
作者
Éric Nguyen,Karan Goel,Albert Gu,G. W. Downs,Preey Shah,Tri Dao,Stephen A. Baccus,Christopher Ré
出处
期刊:Cornell University - arXiv 被引量:8
标识
DOI:10.48550/arxiv.2210.06583
摘要

Visual data such as images and videos are typically modeled as discretizations of inherently continuous, multidimensional signals. Existing continuous-signal models attempt to exploit this fact by modeling the underlying signals of visual (e.g., image) data directly. However, these models have not yet been able to achieve competitive performance on practical vision tasks such as large-scale image and video classification. Building on a recent line of work on deep state space models (SSMs), we propose S4ND, a new multidimensional SSM layer that extends the continuous-signal modeling ability of SSMs to multidimensional data including images and videos. We show that S4ND can model large-scale visual data in $1$D, $2$D, and $3$D as continuous multidimensional signals and demonstrates strong performance by simply swapping Conv2D and self-attention layers with S4ND layers in existing state-of-the-art models. On ImageNet-1k, S4ND exceeds the performance of a Vision Transformer baseline by $1.5\%$ when training with a $1$D sequence of patches, and matches ConvNeXt when modeling images in $2$D. For videos, S4ND improves on an inflated $3$D ConvNeXt in activity classification on HMDB-51 by $4\%$. S4ND implicitly learns global, continuous convolutional kernels that are resolution invariant by construction, providing an inductive bias that enables generalization across multiple resolutions. By developing a simple bandlimiting modification to S4 to overcome aliasing, S4ND achieves strong zero-shot (unseen at training time) resolution performance, outperforming a baseline Conv2D by $40\%$ on CIFAR-10 when trained on $8 \times 8$ and tested on $32 \times 32$ images. When trained with progressive resizing, S4ND comes within $\sim 1\%$ of a high-resolution model while training $22\%$ faster.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
酷波er应助大卿椒采纳,获得10
3秒前
3秒前
3秒前
nini完成签到,获得积分10
4秒前
Ava应助PAIDAXXXX采纳,获得10
4秒前
6秒前
魁梧的衫完成签到 ,获得积分10
6秒前
wujie完成签到,获得积分10
7秒前
戈祁发布了新的文献求助10
8秒前
9秒前
Dding发布了新的文献求助10
11秒前
Dding发布了新的文献求助10
12秒前
跳跳糖发布了新的文献求助30
14秒前
今后应助Yanz采纳,获得10
14秒前
吾系渣渣辉完成签到 ,获得积分0
20秒前
20秒前
20秒前
21秒前
在水一方应助科研通管家采纳,获得10
23秒前
23秒前
烟花应助科研通管家采纳,获得10
23秒前
bkagyin应助科研通管家采纳,获得10
23秒前
充电宝应助科研通管家采纳,获得10
23秒前
Ava应助科研通管家采纳,获得10
23秒前
Dding发布了新的文献求助10
23秒前
pikachu完成签到,获得积分10
24秒前
Yanz发布了新的文献求助10
25秒前
Jerry发布了新的文献求助10
26秒前
28秒前
28秒前
王文杰完成签到 ,获得积分10
30秒前
温暖的炒饭完成签到,获得积分10
31秒前
Jasper应助alice采纳,获得10
36秒前
37秒前
38秒前
丘比特应助如意的秋凌采纳,获得30
41秒前
meredith0571完成签到,获得积分10
42秒前
完美世界应助动听友卉采纳,获得10
43秒前
45秒前
48秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
PowerCascade: A Synthetic Dataset for Cascading Failure Analysis in Power Systems 2000
Metallurgy at high pressures and high temperatures 2000
The SAGE Dictionary of Qualitative Inquiry 610
Signals, Systems, and Signal Processing 610
An Introduction to Medicinal Chemistry 第六版习题答案 600
应急管理理论与实践 530
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6339609
求助须知:如何正确求助?哪些是违规求助? 8154881
关于积分的说明 17134824
捐赠科研通 5395161
什么是DOI,文献DOI怎么找? 2858751
邀请新用户注册赠送积分活动 1836523
关于科研通互助平台的介绍 1686747