GraspLDM: Generative 6-DoF Grasp Synthesis Using Latent Diffusion Models

Vision-based grasping of unknown objects in unstructured environments is a key challenge for autonomous robotic manipulation. A practical grasp synthesis system is required to generate a diverse set of 6-DoF grasps from which a task-relevant grasp can be executed. Although generative models are suit...

Full description

Saved in:
Bibliographic Details
Main Authors: Kuldeep R. Barad, Andrej Orsula, Antoine Richard, Jan Dentler, Miguel A. Olivares-Mendez, Carol Martinez
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10744565/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Vision-based grasping of unknown objects in unstructured environments is a key challenge for autonomous robotic manipulation. A practical grasp synthesis system is required to generate a diverse set of 6-DoF grasps from which a task-relevant grasp can be executed. Although generative models are suitable for learning such complex data distributions, existing models have limitations in grasp quality, long training times, and a lack of flexibility for task-specific generation. In this work, we present GraspLDM, a modular generative framework for 6-DoF grasp synthesis that uses diffusion models as priors in the latent space of a VAE. GraspLDM learns a generative model of object-centric <inline-formula> <tex-math notation="LaTeX">$SE(3)$ </tex-math></inline-formula> grasp poses conditioned on point clouds. GraspLDM&#x2019;s architecture enables us to train task-specific models efficiently by only re-training a small denoising network in the low-dimensional latent space, as opposed to existing models that need expensive re-training. Our framework provides robust and scalable models on both full and partial point clouds. GraspLDM models trained with simulation data transfer well to the real world without any further fine-tuning. Our models provide an 80% success rate for 80 grasp attempts of diverse test objects across two real-world robotic setups.
ISSN:2169-3536