ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.21781
55
0

VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models

27 March 2025
Chi-Pin Huang
Yen-Siang Wu
Hung-Kai Chung
Kai-Po Chang
Fu-En Yang
Yu-Jie Wang
    DiffM
    VGen
ArXivPDFHTML
Abstract

Customized text-to-video generation aims to produce high-quality videos that incorporate user-specified subject identities or motion patterns. However, existing methods mainly focus on personalizing a single concept, either subject identity or motion pattern, limiting their effectiveness for multiple subjects with the desired motion patterns. To tackle this challenge, we propose a unified framework VideoMage for video customization over both multiple subjects and their interactive motions. VideoMage employs subject and motion LoRAs to capture personalized content from user-provided images and videos, along with an appearance-agnostic motion learning approach to disentangle motion patterns from visual appearance. Furthermore, we develop a spatial-temporal composition scheme to guide interactions among subjects within the desired motion patterns. Extensive experiments demonstrate that VideoMage outperforms existing methods, generating coherent, user-controlled videos with consistent subject identities and interactions.

View on arXiv
@article{huang2025_2503.21781,
  title={ VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models },
  author={ Chi-Pin Huang and Yen-Siang Wu and Hung-Kai Chung and Kai-Po Chang and Fu-En Yang and Yu-Chiang Frank Wang },
  journal={arXiv preprint arXiv:2503.21781},
  year={ 2025 }
}
Comments on this paper