PoseDiffusion: A Coarse-to-Fine Framework for Unseen Object 6-DoF Pose Estimation

Jiaming Zhou, Qing Zhu, Yaonan Wang, Mingtao Feng, Chengzhong Wu, Xuebing Liu, Jianan Huang, Ajmal Mian

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

Accurately estimating the six-degrees of freedom (DoF) pose of unseen objects is crucial for successful robotic manipulation in industrial automation. Some existing methods for this task rely on prior knowledge of individual objects, i.e., the model must be trained on the exact object instance or object category. Others perform unseen object pose estimation but are limited in their feature learning and pose refinement ability. To address these problems, we propose an unseen object pose estimation method that follows a coarse-to-fine framework and leverages the powerful learning ability of diffusion models. We introduce a diffusion model for generating object poses, and conduct a comparison between the generated poses and the original pose to determine the optimal one. We design a novel pose estimation module to provide coarse poses for the PoseDiffusion. This module comprises two feature extraction modules that extract global and masked features. In addition, we propose a strategy to estimate the pose by comparing the similarity between rendered and query poses. The renderings of an unseen object from various viewpoints are generated from its computer-aided design (CAD) model. Our method requires a CAD model of the unseen object only during inference, a scenario well suited to industrial applications. Experimental evaluation on benchmark datasets demonstrates that the proposed framework outperforms existing approaches, achieving state-of-the-art performance in six-DoF object pose estimation.

Original languageEnglish
Pages (from-to)11127-11138
Number of pages12
JournalIEEE Transactions on Industrial Informatics
Volume20
Issue number9
Early online date23 May 2024
DOIs
Publication statusPublished - 2024

Fingerprint

Dive into the research topics of 'PoseDiffusion: A Coarse-to-Fine Framework for Unseen Object 6-DoF Pose Estimation'. Together they form a unique fingerprint.

Cite this