CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image

计算机视觉人工智能组分（热力学）计算机图形学（图像）计算机科学图像（数学） RGB颜色模型三维重建迭代重建热力学物理

作者

Kaixin Yao,Xu Yan,Yan Zeng,Qixuan Zhang,Lan Xu,Wei Yang,Jiayuan Gu,Jingyi Yu

出处

期刊：ACM Transactions on Graphics [Association for Computing Machinery]
日期：2025-07-26 卷期号：44 (4): 1-19 被引量：1

标识

摘要

Recovering high-quality 3D scenes from a single RGB image is a challenging task in computer graphics. Current methods often struggle with domain-specific limitations or low-quality object generation. To address these, we propose CAST (Component-Aligned 3D Scene Reconstruction from a Single RGB Image), a novel method for 3D scene reconstruction. CAST starts by extracting object-level 2D segmentation and relative depth information from the input image, followed by using a GPT-based model to analyze inter-object spatial relations. This enables understanding of how objects relate to each other within the scene, ensuring more coherent reconstruction. CAST then employs an occlusion-aware large-scale 3D generation model to independently generate each object's full geometry, using Masked Auto Encoder (MAE) and point cloud conditioning to mitigate the effects of occlusions and partial object information, ensuring accurate alignment with the source image's geometry and texture. To align each object with the scene, the alignment generation model computes the necessary transformations, allowing the generated meshes to be accurately placed and integrated into the scene's point cloud. Finally, CAST applies a physics-aware correction mechanism, which leverages a fine-grained relation graph to generate a constraint graph. This graph guides the optimization of object poses, ensuring physical consistency and spatial coherence. By utilizing Signed Distance Fields (SDF), the model effectively addresses issues such as occlusions, object penetration, and floating objects, ensuring that the generated scene accurately reflects real-world physical interactions. Experimental results demonstrate that CAST significantly improves the quality of single-image 3D scene reconstruction, offering enhanced realism and accuracy in scene understanding and reconstruction tasks. CAST has practical applications in virtual content creation, such as immersive game environments and film production, where real-world setups can be seamlessly integrated into virtual landscapes. Additionally, CAST can be leveraged in robotics, enabling efficient real-to-simulation workflows and providing realistic, scalable simulation environments for robotic systems.

求助该文献

CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image

今日热心研友