等变映射
姿势
人工智能
计算机科学
计算机视觉
接头(建筑物)
群(周期表)
三维姿态估计
模式识别(心理学)
估计
数学
工程类
建筑工程
化学
有机化学
系统工程
纯数学
作者
Boyan Wan,Yifei Shi,Xiaohong Chen,Kai Xu
标识
DOI:10.1109/tpami.2025.3540593
摘要
Object pose estimation and shape reconstruction are inherently coupled tasks although they have so far been studied separately in most existing approaches. A few recent works addressed the problem of joint pose estimation and shape reconstruction, but they found difficulties in handling partial observations and shape ambiguities. An open challenge in this area is to design a mechanism that has the two tasks benefit each other and boost the performance and robustness of both. In this work, we advocate the use of diffusion models for joint estimation of category-level object poses and reconstruction of object geometry. Diffusion models formulate shape reconstruction as a generation process conditioned on input observations. It has two main advantages. First, the iterative inference of diffusion models provides a mechanism for iterative optimization for both pose estimation and shape reconstruction. Second, diffusion models allow multiple outputs starting from different input noises, which would address the problem of ambiguity caused by partial observations. To achieve this, we propose equivariant diffusion model for joint pose estimation and shape reconstruction. The approach consists of an equivariant feature extractor to aggregate features of the input point cloud and a ShapePose diffusion model to generate object pose and shape simultaneously. To avoid training the model on all possible shape poses in the SO(3) space, we propose to augment the diffusion model with A5-group neurons where the neurons are converted into 5D vectors and can be rotated with the alternating group A5. Based on the A5-group neurons, we implement SO(3)-equivariant 3D point convolution and SO(3)-equivariant concatenation, making the entire network SO(3)-equivariant. Moreover, to select the most plausible combination of pose and shape from the generated ones, we propose a geometry-based measure of plausibility for an estimated pose along with a reconstructed shape. Extensive experiments demonstrate the effectiveness of the proposed method. Specifically, our method achieves the state-of-the-art on two public datasets and a new dataset with stacked objects, in terms of shape reconstruction and pose estimation. In particular, we show the proposed method could provide multiple plausible outputs under partial observations and shape ambiguities.
科研通智能强力驱动
Strongly Powered by AbleSci AI