ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.16969
5
0

3D Equivariant Visuomotor Policy Learning via Spherical Projection

22 May 2025
Boce Hu
Dian Wang
David Klee
Heng Tian
Xupeng Zhu
Haojie Huang
Robert Platt
Robin Walters
ArXivPDFHTML
Abstract

Equivariant models have recently been shown to improve the data efficiency of diffusion policy by a significant margin. However, prior work that explored this direction focused primarily on point cloud inputs generated by multiple cameras fixed in the workspace. This type of point cloud input is not compatible with the now-common setting where the primary input modality is an eye-in-hand RGB camera like a GoPro. This paper closes this gap by incorporating into the diffusion policy model a process that projects features from the 2D RGB camera image onto a sphere. This enables us to reason about symmetries in SO(3) without explicitly reconstructing a point cloud. We perform extensive experiments in both simulation and the real world that demonstrate that our method consistently outperforms strong baselines in terms of both performance and sample efficiency. Our work is the first SO(3)-equivariant policy learning framework for robotic manipulation that works using only monocular RGB inputs.

View on arXiv
@article{hu2025_2505.16969,
  title={ 3D Equivariant Visuomotor Policy Learning via Spherical Projection },
  author={ Boce Hu and Dian Wang and David Klee and Heng Tian and Xupeng Zhu and Haojie Huang and Robert Platt and Robin Walters },
  journal={arXiv preprint arXiv:2505.16969},
  year={ 2025 }
}
Comments on this paper