Self-Supervised Category-Level Articulated Object Pose Estimation with Part-Level SE(3) Equivariance

Abstract

Category-level articulated object pose estimation aims to estimate a hierarchy of articulation-aware object poses of an unseen articulated object from a known category. To reduce the heavy annotations needed for supervised learning methods, we present a novel self-supervised strategy that solves this problem without any human labels. Our key idea is to factorize canonical shapes and articulated object poses from input articulated shapes through part-level equivariant shape analysis. Specifically, we first introduce the concept of part-level SE(3) equivariance and devise a network to learn features of such property. Then, through a carefully designed fine-grained pose-shape disentanglement strategy, we expect that canonical spaces to support pose estimation could be induced automatically. Thus, we could further predict articulated object poses as per-part rigid transformations describing how parts transform from their canonical part spaces to the camera space. Extensive experiments demonstrate the effectiveness of our method on both complete and partial point clouds from synthetic and real articulated object datasets.

Video

Segmentation and Alignment on Laptop (S)

Segmentation and Alignment on Oven

Segmentation and Alignment on Eyeglasses

Segmentation and Alignment on Safe

Segmentation and Alignment on Complete Shapes

Visualization for experimental results on complete point clouds. Shapes drawn for every three shapes from the left side to the right side are the input point cloud, reconstructions, and the predicted canonical object shape. Some shapes are aligned to the same glboal frame just for a better view. Their global pose may vary when feeding into the network.

Segmentation and Alignment on Partial Shapes

Visualization for experimental results on partial point clouds. Shapes drawn for every three shapes from the left side to the right side are the input point cloud, reconstructions, and the predicted canonical object shape.

Application: Shape Reconstruction and Manipulation

We can take shapes in varying articulation states and output their part-by-part reconstructions, as shown in middle nine shape. The predicted joints could further support us to manipulate moving parts for objects in new articulation states.

BibTeX

@inproceedings{liu2023self,
      title={Self-Supervised Category-Level Articulated Object Pose Estimation with Part-Level SE (3) Equivariance},
      author={Liu, Xueyi and Zhang, Ji and Hu, Ruizhen and Huang, Haibin and Wang, He and Yi, Li},
      booktitle={The Eleventh International Conference on Learning Representations},
      year={2023}
    }