计算机科学
感知
人工智能
计算机视觉
可视化
视觉感受
计算机图形学(图像)
心理学
神经科学
作者
Ziyang Hong,C. Patrick Yue
标识
DOI:10.1109/tcsvt.2024.3406401
摘要
We introduce a novel learning method that can effectively perceive both the geometry structure and semantic labels of a 3D scene in real time. Existing real-time 3D scene reconstruction approaches often rely on volumetric schemes to regress a Truncated Signed Distance Function (TSDF) as the 3D representation. However, these volumetric approaches primarily focus on ensuring global coherence in the reconstructed scene, which often results in a lack of local geometric detail. To address this limitation, we propose a solution that leverages the latent geometric knowledge present in 2D image features by explicit depth prediction thereby creating anchored features, which are used to refine the learning of occupancy in the TSDF volume. Furthermore, we discover that this cross-dimensional feature refinement methodology can also be applied to the task of semantic segmentation by utilizing semantic priors. As a result, we propose an end-to-end cross-dimensional refinement neural network (CDRNet) that can extract both the 3D mesh and 3D semantic labeling of a scene in real time. Through experimental evaluation on multiple datasets, we demonstrate that our method achieves state-of-the-art 3D perception capability by boosting over 40% and 18% in 3D semantic segmentation and geometric reconstruction respectively over the prior art. These promising results indicate the significant potential of our approach for various industrial applications. Demo video and code can be found on the project page, https://hafred.github.io/cdrnet/.
科研通智能强力驱动
Strongly Powered by AbleSci AI