ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.14505
47
0

MusicInfuser: Making Video Diffusion Listen and Dance

18 March 2025
Susung Hong
Ira Kemelmacher-Shlizerman
Brian L. Curless
Steven M. Seitz
    VGen
ArXivPDFHTML
Abstract

We introduce MusicInfuser, an approach for generating high-quality dance videos that are synchronized to a specified music track. Rather than attempting to design and train a new multimodal audio-video model, we show how existing video diffusion models can be adapted to align with musical inputs by introducing lightweight music-video cross-attention and a low-rank adapter. Unlike prior work requiring motion capture data, our approach fine-tunes only on dance videos. MusicInfuser achieves high-quality music-driven video generation while preserving the flexibility and generative capabilities of the underlying models. We introduce an evaluation framework using Video-LLMs to assess multiple dimensions of dance generation quality. The project page and code are available atthis https URL.

View on arXiv
@article{hong2025_2503.14505,
  title={ MusicInfuser: Making Video Diffusion Listen and Dance },
  author={ Susung Hong and Ira Kemelmacher-Shlizerman and Brian Curless and Steven M. Seitz },
  journal={arXiv preprint arXiv:2503.14505},
  year={ 2025 }
}
Comments on this paper