内容(测量理论)
高分辨率
高斯分布
分辨率(逻辑)
计算机科学
地理
数学
人工智能
遥感
物理
数学分析
量子力学
作者
Jiaxiang Tang,Zhaoxi Chen,Xiaokang Chen,Tengfei Wang,Gang Zeng,Ziwei Liu
出处
期刊:Cornell University - arXiv
日期:2024-02-07
标识
DOI:10.48550/arxiv.2402.05054
摘要
3D content creation has achieved significant progress in terms of both quality and speed. Although current feed-forward models can produce 3D objects in seconds, their resolution is constrained by the intensive computation required during training. In this paper, we introduce Large Multi-View Gaussian Model (LGM), a novel framework designed to generate high-resolution 3D models from text prompts or single-view images. Our key insights are two-fold: 1) 3D Representation: We propose multi-view Gaussian features as an efficient yet powerful representation, which can then be fused together for differentiable rendering. 2) 3D Backbone: We present an asymmetric U-Net as a high-throughput backbone operating on multi-view images, which can be produced from text or single-view image input by leveraging multi-view diffusion models. Extensive experiments demonstrate the high fidelity and efficiency of our approach. Notably, we maintain the fast speed to generate 3D objects within 5 seconds while boosting the training resolution to 512, thereby achieving high-resolution 3D content generation.
科研通智能强力驱动
Strongly Powered by AbleSci AI