23
0

SEM: Enhancing Spatial Understanding for Robust Robot Manipulation

Main:8 Pages
6 Figures
Bibliography:3 Pages
6 Tables
Appendix:2 Pages
Abstract

A key challenge in robot manipulation lies in developing policy models with strong spatial understanding, the ability to reason about 3D geometry, object relations, and robot embodiment. Existing methods often fall short: 3D point cloud models lack semantic abstraction, while 2D image encoders struggle with spatial reasoning. To address this, we propose SEM (Spatial Enhanced Manipulation model), a novel diffusion-based policy framework that explicitly enhances spatial understanding from two complementary perspectives. A spatial enhancer augments visual representations with 3D geometric context, while a robot state encoder captures embodiment-aware structure through graphbased modeling of joint dependencies. By integrating these modules, SEM significantly improves spatial understanding, leading to robust and generalizable manipulation across diverse tasks that outperform existing baselines.

View on arXiv
@article{lin2025_2505.16196,
  title={ SEM: Enhancing Spatial Understanding for Robust Robot Manipulation },
  author={ Xuewu Lin and Tianwei Lin and Lichao Huang and Hongyu Xie and Yiwei Jin and Keyu Li and Zhizhong Su },
  journal={arXiv preprint arXiv:2505.16196},
  year={ 2025 }
}
Comments on this paper