Patches Are All You Need?

业务
作者
Trockman, Asher,Kolter, J. Zico
出处
期刊:Cornell University - arXiv 被引量:1
标识
DOI:10.48550/arxiv.2201.09792
摘要

Although convolutional networks have been the dominant architecture for vision tasks for many years, recent experiments have shown that Transformer-based models, most notably the Vision Transformer (ViT), may exceed their performance in some settings. However, due to the quadratic runtime of the self-attention layers in Transformers, ViTs require the use of patch embeddings, which group together small regions of the image into single input features, in order to be applied to larger image sizes. This raises a question: Is the performance of ViTs due to the inherently-more-powerful Transformer architecture, or is it at least partly due to using patches as the input representation? In this paper, we present some evidence for the latter: specifically, we propose the ConvMixer, an extremely simple model that is similar in spirit to the ViT and the even-more-basic MLP-Mixer in that it operates directly on patches as input, separates the mixing of spatial and channel dimensions, and maintains equal size and resolution throughout the network. In contrast, however, the ConvMixer uses only standard convolutions to achieve the mixing steps. Despite its simplicity, we show that the ConvMixer outperforms the ViT, MLP-Mixer, and some of their variants for similar parameter counts and data set sizes, in addition to outperforming classical vision models such as the ResNet. Our code is available at https://github.com/locuslab/convmixer.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
LittleSyar发布了新的文献求助10
刚刚
LittleSyar发布了新的文献求助10
刚刚
LittleSyar发布了新的文献求助10
刚刚
LittleSyar发布了新的文献求助10
刚刚
GGGirafe发布了新的文献求助10
刚刚
刚刚
LittleSyar发布了新的文献求助10
1秒前
LittleSyar发布了新的文献求助10
1秒前
隐形曼青应助楼一笑采纳,获得10
1秒前
1秒前
CipherSage应助科研通管家采纳,获得10
1秒前
Singularity应助科研通管家采纳,获得10
1秒前
传奇3应助科研通管家采纳,获得10
1秒前
充电宝应助科研通管家采纳,获得10
2秒前
情怀应助科研通管家采纳,获得10
2秒前
Jasper应助科研通管家采纳,获得10
2秒前
搜集达人应助科研通管家采纳,获得10
2秒前
bjbmtxy应助科研通管家采纳,获得10
2秒前
Ava应助科研通管家采纳,获得10
2秒前
Singularity应助科研通管家采纳,获得10
2秒前
2秒前
2秒前
顾矜应助科研通管家采纳,获得10
2秒前
2秒前
英姑应助科研通管家采纳,获得10
2秒前
思源应助科研通管家采纳,获得10
2秒前
bkagyin应助科研通管家采纳,获得10
3秒前
Singularity应助科研通管家采纳,获得10
3秒前
Singularity应助科研通管家采纳,获得10
3秒前
Ava应助科研通管家采纳,获得10
3秒前
敏感初露发布了新的文献求助10
3秒前
3秒前
情怀应助科研通管家采纳,获得10
3秒前
3秒前
3秒前
3秒前
3秒前
打打应助科研通管家采纳,获得10
3秒前
3秒前
3秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Polymorphism and polytypism in crystals 1000
Signals, Systems, and Signal Processing 610
Discrete-Time Signals and Systems 610
Russian Politics Today: Stability and Fragility (2nd Edition) 500
Death Without End: Korea and the Thanatographics of War 500
Der Gleislage auf der Spur 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 纳米技术 有机化学 物理 生物化学 化学工程 计算机科学 复合材料 内科学 催化作用 光电子学 物理化学 电极 冶金 遗传学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 6083117
求助须知:如何正确求助?哪些是违规求助? 7913456
关于积分的说明 16367781
捐赠科研通 5218296
什么是DOI,文献DOI怎么找? 2789886
邀请新用户注册赠送积分活动 1772906
关于科研通互助平台的介绍 1649256