计算机科学
可扩展性
人工智能
视觉空间
导线
卷积神经网络
感受野
计算复杂性理论
编码(集合论)
计算机视觉
算法
理论计算机科学
模式识别(心理学)
感知
大地测量学
集合(抽象数据类型)
数据库
神经科学
生物
程序设计语言
地理
作者
Yue Liu,Yunjie Tian,Yuzhong Zhao,Hongtian Yu,Lingxi Xie,Yaowei Wang,Qixiang Ye,Yunfan Liu
出处
期刊:Cornell University - arXiv
日期:2024-01-18
被引量:350
标识
DOI:10.48550/arxiv.2401.10166
摘要
Designing computationally efficient network architectures remains an ongoing necessity in computer vision. In this paper, we adapt Mamba, a state-space language model, into VMamba, a vision backbone with linear time complexity. At the core of VMamba is a stack of Visual State-Space (VSS) blocks with the 2D Selective Scan (SS2D) module. By traversing along four scanning routes, SS2D bridges the gap between the ordered nature of 1D selective scan and the non-sequential structure of 2D vision data, which facilitates the collection of contextual information from various sources and perspectives. Based on the VSS blocks, we develop a family of VMamba architectures and accelerate them through a succession of architectural and implementation enhancements. Extensive experiments demonstrate VMamba's promising performance across diverse visual perception tasks, highlighting its superior input scaling efficiency compared to existing benchmark models. Source code is available at https://github.com/MzeroMiko/VMamba.
科研通智能强力驱动
Strongly Powered by AbleSci AI