Effective human-robot interaction requires robots to identify human intentions and generate expressive, socially appropriate motions in real-time. Existing approaches often rely on fixed motion libraries or computationally expensive generative models. We propose a hierarchical framework that combines intention-aware reasoning via in-context learning (ICL) with real-time motion generation using diffusion models. Our system introduces structured prompting with confidence scoring, fallback behaviors, and social context awareness to enable intention refinement and adaptive response. Leveraging large-scale motion datasets and efficient latent-space denoising, the framework generates diverse, physically plausible gestures suitable for dynamic humanoid interactions. Experimental validation on a physical platform demonstrates the robustness and social alignment of our method in realistic scenarios.
View on arXiv@article{bao2025_2506.01563, title={ Hierarchical Intention-Aware Expressive Motion Generation for Humanoid Robots }, author={ Lingfan Bao and Yan Pan and Tianhu Peng and Chengxu Zhou }, journal={arXiv preprint arXiv:2506.01563}, year={ 2025 } }