Equivariant Diffusion Model with A5-Group Neurons for Joint Pose Estimation and Shape Reconstruction

等变映射姿势人工智能计算机科学计算机视觉接头（建筑物）群（周期表）三维姿态估计模式识别（心理学）估计数学工程类建筑工程化学有机化学系统工程纯数学

作者

Boyan Wan,Yifei Shi,Xiaohong Chen,Kai Xu

出处

期刊：IEEE Transactions on Pattern Analysis and Machine Intelligence [IEEE Computer Society]
日期：2025-01-01 卷期号：: 1-15

链接

nih.govdoi.org

标识

DOI：10.1109/tpami.2025.3540593

摘要

Object pose estimation and shape reconstruction are inherently coupled tasks although they have so far been studied separately in most existing approaches. A few recent works addressed the problem of joint pose estimation and shape reconstruction, but they found difficulties in handling partial observations and shape ambiguities. An open challenge in this area is to design a mechanism that has the two tasks benefit each other and boost the performance and robustness of both. In this work, we advocate the use of diffusion models for joint estimation of category-level object poses and reconstruction of object geometry. Diffusion models formulate shape reconstruction as a generation process conditioned on input observations. It has two main advantages. First, the iterative inference of diffusion models provides a mechanism for iterative optimization for both pose estimation and shape reconstruction. Second, diffusion models allow multiple outputs starting from different input noises, which would address the problem of ambiguity caused by partial observations. To achieve this, we propose equivariant diffusion model for joint pose estimation and shape reconstruction. The approach consists of an equivariant feature extractor to aggregate features of the input point cloud and a ShapePose diffusion model to generate object pose and shape simultaneously. To avoid training the model on all possible shape poses in the SO(3) space, we propose to augment the diffusion model with A5-group neurons where the neurons are converted into 5D vectors and can be rotated with the alternating group A5. Based on the A5-group neurons, we implement SO(3)-equivariant 3D point convolution and SO(3)-equivariant concatenation, making the entire network SO(3)-equivariant. Moreover, to select the most plausible combination of pose and shape from the generated ones, we propose a geometry-based measure of plausibility for an estimated pose along with a reconstructed shape. Extensive experiments demonstrate the effectiveness of the proposed method. Specifically, our method achieves the state-of-the-art on two public datasets and a new dataset with stacked objects, in terms of shape reconstruction and pose estimation. In particular, we show the proposed method could provide multiple plausible outputs under partial observations and shape ambiguities.

求助该文献

最长约 10秒，即可获得该文献文件

Equivariant Diffusion Model with A5-Group Neurons for Joint Pose Estimation and Shape Reconstruction

今日热心研友