Quadrupedal robots have demonstrated impressive locomotion capabilities in complex environments, but equipping them with autonomous versatile manipulation skills in a scalable way remains a significant challenge. In this work, we introduce a cross-embodiment imitation learning system for quadrupedal manipulation, leveraging data collected from both humans and LocoMan, a quadruped equipped with multiple manipulation modes. Specifically, we develop a teleoperation and data collection pipeline, which unifies and modularizes the observation and action spaces of the human and the robot. To effectively leverage the collected data, we propose an efficient modularized architecture that supports co-training and pretraining on structured modality-aligned data across different embodiments. Additionally, we construct the first manipulation dataset for the LocoMan robot, covering various household tasks in both unimanual and bimanual modes, supplemented by a corresponding human dataset. We validate our system on six real-world manipulation tasks, where it achieves an average success rate improvement of 41.9% overall and 79.7% under out-of-distribution (OOD) settings compared to the baseline. Pretraining with human data contributes a 38.6% success rate improvement overall and 82.7% under OOD settings, enabling consistently better performance with only half the amount of robot data. Our code, hardware, and data are open-sourced at:this https URL.
View on arXiv@article{niu2025_2506.16475, title={ Human2LocoMan: Learning Versatile Quadrupedal Manipulation with Human Pretraining }, author={ Yaru Niu and Yunzhe Zhang and Mingyang Yu and Changyi Lin and Chenhao Li and Yikai Wang and Yuxiang Yang and Wenhao Yu and Tingnan Zhang and Zhenzhen Li and Jonathan Francis and Bingqing Chen and Jie Tan and Ding Zhao }, journal={arXiv preprint arXiv:2506.16475}, year={ 2025 } }