25
0

AnimateAnywhere: Rouse the Background in Human Image Animation

Abstract

Human image animation aims to generate human videos of given characters and backgrounds that adhere to the desired pose sequence. However, existing methods focus more on human actions while neglecting the generation of background, which typically leads to static results or inharmonious movements. The community has explored camera pose-guided animation tasks, yet preparing the camera trajectory is impractical for most entertainment applications and ordinary users. As a remedy, we present an AnimateAnywhere framework, rousing the background in human image animation without requirements on camera trajectories. In particular, based on our key insight that the movement of the human body often reflects the motion of the background, we introduce a background motion learner (BML) to learn background motions from human pose sequences. To encourage the model to learn more accurate cross-frame correspondences, we further deploy an epipolar constraint on the 3D attention map. Specifically, the mask used to suppress geometrically unreasonable attention is carefully constructed by combining an epipolar mask and the current 3D attention map. Extensive experiments demonstrate that our AnimateAnywhere effectively learns the background motion from human pose sequences, achieving state-of-the-art performance in generating human animation results with vivid and realistic backgrounds. The source code and model will be available atthis https URL.

View on arXiv
@article{liu2025_2504.19834,
  title={ AnimateAnywhere: Rouse the Background in Human Image Animation },
  author={ Xiaoyu Liu and Mingshuai Yao and Yabo Zhang and Xianhui Lin and Peiran Ren and Xiaoming Li and Ming Liu and Wangmeng Zuo },
  journal={arXiv preprint arXiv:2504.19834},
  year={ 2025 }
}
Comments on this paper