7
0

MatchDance: Collaborative Mamba-Transformer Architecture Matching for High-Quality 3D Dance Synthesis

Abstract

Music-to-dance generation represents a challenging yet pivotal task at the intersection of choreography, virtual reality, and creative content generation. Despite its significance, existing methods face substantial limitation in achieving choreographic consistency. To address the challenge, we propose MatchDance, a novel framework for music-to-dance generation that constructs a latent representation to enhance choreographic consistency. MatchDance employs a two-stage design: (1) a Kinematic-Dynamic-based Quantization Stage (KDQS), which encodes dance motions into a latent representation by Finite Scalar Quantization (FSQ) with kinematic-dynamic constraints and reconstructs them with high fidelity, and (2) a Hybrid Music-to-Dance Generation Stage(HMDGS), which uses a Mamba-Transformer hybrid architecture to map music into the latent representation, followed by the KDQS decoder to generate 3D dance motions. Additionally, a music-dance retrieval framework and comprehensive metrics are introduced for evaluation. Extensive experiments on the FineDance dataset demonstrate state-of-the-art performance. Code will be released upon acceptance.

View on arXiv
@article{yang2025_2505.14222,
  title={ MatchDance: Collaborative Mamba-Transformer Architecture Matching for High-Quality 3D Dance Synthesis },
  author={ Kaixing Yang and Xulong Tang and Yuxuan Hu and Jiahao Yang and Hongyan Liu and Qinnan Zhang and Jun He and Zhaoxin Fan },
  journal={arXiv preprint arXiv:2505.14222},
  year={ 2025 }
}
Comments on this paper