ANT: Adaptive Neural Temporal-Aware Text-to-Motion Model

3 June 2025

Main:11 Pages

8 Figures

Bibliography:1 Pages

7 Tables

Appendix:1 Pages

Abstract

While diffusion models advance text-to-motion generation, their static semantic conditioning ignores temporal-frequency demands: early denoising requires structural semantics for motion foundations while later stages need localized details for text alignment. This mismatch mirrors biological morphogenesis where developmental phases demand distinct genetic programs. Inspired by epigenetic regulation governing morphological specialization, we propose **(ANT)**, an **A**daptive **N**eural **T**emporal-Aware architecture. ANT orchestrates semantic granularity through: **(i) Semantic Temporally Adaptive (STA) Module:** Automatically partitions denoising into low-frequency structural planning and high-frequency refinement via spectral analysis. **(ii) Dynamic Classifier-Free Guidance scheduling (DCFG):** Adaptively adjusts conditional to unconditional ratio enhancing efficiency while maintaining fidelity. **(iii) Temporal-semantic reweighting:** Quantitatively aligns text influence with phase requirements. Extensive experiments show that ANT can be applied to various baselines, significantly improving model performance, and achieving state-of-the-art semantic alignment on StableMoFusion.

View on arXiv

@article{chen2025_2506.02452,
  title={ ANT: Adaptive Neural Temporal-Aware Text-to-Motion Model },
  author={ Wenshuo Chen and Kuimou Yu and Haozhe Jia and Kaishen Yuan and Bowen Tian and Songning Lai and Hongru Xiao and Erhang Zhang and Lei Wang and Yutao Yue },
  journal={arXiv preprint arXiv:2506.02452},
  year={ 2025 }
}

Comments on this paper