32
0

Embodied Visuomotor Representation

Abstract

Imagine sitting at your desk, looking at various objects on it. While you do not know their exact distances from your eye in meters, you can reach out and touch them. Instead of an externally defined unit, your sense of distance is inherently tied to your action's effect on your embodiment. In contrast, conventional robotics relies on precise calibration to external units with which separate vision and control processes communicate. This necessitates highly engineered and expensive systems that cannot be easily reconfigured.To address this, we introduce Embodied Visuomotor Representation, a methodology through which robots infer distance in a unit implied by their actions. That is, without depending on calibrated 3D sensors or known physical models. With it, we demonstrate that a robot without prior knowledge of its size, environmental scale, or strength can quickly learn to touch and clear obstacles within seconds of operation. Likewise, in simulation, an agent without knowledge of its mass or strength can successfully jump across a gap of unknown size after a few test oscillations. These behaviors mirror natural strategies observed in bees and gerbils, which also lack calibration in an external unit, and highlight the potential for action-driven perception in robotics.

View on arXiv
@article{burner2025_2410.00287,
  title={ Embodied Visuomotor Representation },
  author={ Levi Burner and Cornelia Fermüller and Yiannis Aloimonos },
  journal={arXiv preprint arXiv:2410.00287},
  year={ 2025 }
}
Comments on this paper