SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings

This paper introduces SoccerDiffusion, a transformer-based diffusion model designed to learn end-to-end control policies for humanoid robot soccer directly from real-world gameplay recordings. Using data collected from RoboCup competitions, the model predicts joint command trajectories from multi-modal sensor inputs, including vision, proprioception, and game state. We employ a distillation technique to enable real-time inference on embedded platforms that reduces the multi-step diffusion process to a single step. Our results demonstrate the model's ability to replicate complex motion behaviors such as walking, kicking, and fall recovery both in simulation and on physical robots. Although high-level tactical behavior remains limited, this work provides a robust foundation for subsequent reinforcement learning or preference optimization methods. We release the dataset, pretrained models, and code under:this https URL
View on arXiv@article{vahl2025_2504.20808, title={ SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings }, author={ Florian Vahl and Jörn Griepenburg and Jan Gutsche and Jasper Güldenstein and Jianwei Zhang }, journal={arXiv preprint arXiv:2504.20808}, year={ 2025 } }