ClutterDexGrasp: A Sim-to-Real System for General Dexterous Grasping in Cluttered Scenes

Dexterous grasping in cluttered scenes presents significant challenges due to diverse object geometries, occlusions, and potential collisions. Existing methods primarily focus on single-object grasping or grasp-pose prediction without interaction, which are insufficient for complex, cluttered scenes. Recent vision-language-action models offer a potential solution but require extensive real-world demonstrations, making them costly and difficult to scale. To address these limitations, we revisit the sim-to-real transfer pipeline and develop key techniques that enable zero-shot deployment in reality while maintaining robust generalization. We propose ClutterDexGrasp, a two-stage teacher-student framework for closed-loop target-oriented dexterous grasping in cluttered scenes. The framework features a teacher policy trained in simulation using clutter density curriculum learning, incorporating both a novel geometry and spatially-embedded scene representation and a comprehensive safety curriculum, enabling general, dynamic, and safe grasping behaviors. Through imitation learning, we distill the teacher's knowledge into a student 3D diffusion policy (DP3) that operates on partial point cloud observations. To the best of our knowledge, this represents the first zero-shot sim-to-real closed-loop system for target-oriented dexterous grasping in cluttered scenes, demonstrating robust performance across diverse objects and layouts. More details and videos are available atthis https URL.
View on arXiv@article{chen2025_2506.14317, title={ ClutterDexGrasp: A Sim-to-Real System for General Dexterous Grasping in Cluttered Scenes }, author={ Zeyuan Chen and Qiyang Yan and Yuanpei Chen and Tianhao Wu and Jiyao Zhang and Zihan Ding and Jinzhou Li and Yaodong Yang and Hao Dong }, journal={arXiv preprint arXiv:2506.14317}, year={ 2025 } }