IEP-Grasp: Learning Generalizable Dexterous Grasping through Interactive Embodied Priors

Institution Name
Conferance name and year

*Indicates Equal Contribution

Given an unknown object's point cloud, our method first applies a human-inspired Hand-Object Guidance module to decompose the shape and evaluate grasp, then employs an Embodied Affordance Module with online interactive updates to refine grasp affordances, and finally executes the Dexterous Grasping Module to generate stable multi-finger grasp poses in real world.

Abstract

Robotic dexterous grasping is a fundamental operation in industrial automation and human-robot collaboration, yet the high-dimensional action space and diverse object shapes pose significant challenges for autonomous learning with multi-finger hands. To tackle these issues, we propose IEP-Grasp, an autonomous grasping algorithm that integrates interactive embodied priors based on hand-object contact constraints into deep reinforcement learning with point cloud inputs, enabling the learning of both the final grasp pose and the dynamic grasping process. By focusing the agent's attention on critical grasping points, these priors enhance grasp stability and provide interpretable insights of hand-object interactions. First, an unsupervised heuristic algorithm decomposes object point clouds into basic cuboid shapes, allowing a human-inspired hand-object prior guidance module to evaluate contact states and improve adaptability. Our approach then refines a grasp affordance map through interactive exploration of contact patterns, enabling the agent to locate optimal hand-object configurations. Finally, these interactive embodied priors are embedded into both observation and reward signals, synchronizing the learning of priors and the manipulation policy. Extensive experiments in simulated and real-world environments with a 16-degree-of-freedom, four-fingered Allegro hand demonstrate significant improvements over baselines in grasp success rate and stability across diverse objects and categories.

Method Overview

MY ALT TEXT

This approach first obtains the point cloud of the scene using RGB-D camera. Subsequently, guided by human grasp priors, HOPG evaluates contact relationships between the object's point cloud and the dexterous hand. EAM then leverages knowledge from embodied interaction training to generate affordance priors for the object's point cloud. Finally, DGM integrates these priors with the scene point cloud to produce dexterous grasping actions. Dashed lines indicate training-only processes. Notably, our policies trained in simulation using point cloud inputs transfer seamlessly to real-world settings without fine-tuning, demonstrating strong generalization even to unseen objects during training.

Real World Experiments

Real World Setting

MY ALT TEXT

Real World Results

1. The Performance of Our Method on Various Bottles

Mustard bottle

Cleanser bottle

Wine bottle

Laundry bottle

Beverage bottle

Spray bottle

2. The Performance of Our Method on Various Cans and Fruit

Chips can

Beer can

Coke can

Soup can

Meat can

Banana

3. The Performance of Our Method on Miscellaneous objects

Goblet

Sugar box

Cracker box

4. The Performance of Real World baseline

Mustard bottle

Cleanser bottle

Wine bottle

Laundry bottle

Beverage bottle

Spray bottle

Chips can

Beer can

Coke can

Soup can

Meat can

Banana

Goblet

Sugar box

Cracker box

BibTeX

BibTex Code Here