计算机科学
人工智能
计算机视觉
高斯分布
计算机图形学(图像)
迭代重建
物理
量子力学
作者
Qiuhong Shen,Zike Wu,Xuanyu Yi,Pan Zhou,Hanwang Zhang,Shuicheng Yan,Xinchao Wang
标识
DOI:10.1109/tpami.2025.3569596
摘要
We tackle the challenge of efficiently reconstructing a 3D asset from a single image at millisecond speed. In this work, we introduce Gamba, an end-to-end 3D reconstruction model from a single-view image, emphasizing two main insights: (1) Efficient Backbone Design: introducing a Mamba-based GambaFormer network to model 3D Gaussian Splatting (3DGS) reconstruction as sequential prediction with linear scalability of token length, thereby accommodating a substantial number of Gaussians; (2) Robust Gaussian Constraints: deriving radial mask constraints from multi-view masks to eliminate the need for warmup supervision of 3D point clouds in training. We trained Gamba on Objaverse and assessed it against existing optimization-based and feed-forward 3D reconstruction approaches on the GSO Dataset, among which Gamba is the only end-to-end trained single-view reconstruction model with 3DGS. Experimental results demonstrate its competitive generation capabilities both qualitatively and quantitatively and highlight its remarkable speed: Gamba completes reconstruction within 0.05 seconds on a single NVIDIA A100 GPU, which is about $1,000\times$ faster than optimization-based methods. Please see our project page at https://florinshen.github.io/gamba-project/.
科研通智能强力驱动
Strongly Powered by AbleSci AI