ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.05978
50
1

MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice

7 March 2025
Hongwei Yi
Tian Ye
Shitong Shao
Xuancheng Yang
Jiantong Zhao
Hanzhong Guo
Terrance Wang
Q. Yin
Zeke Xie
Lei Zhu
Wei Li
Michael Lingelbach
Daquan Zhou
    VGen
ArXivPDFHTML
Abstract

We present MagicInfinite, a novel diffusion Transformer (DiT) framework that overcomes traditional portrait animation limitations, delivering high-fidelity results across diverse character types-realistic humans, full-body figures, and stylized anime characters. It supports varied facial poses, including back-facing views, and animates single or multiple characters with input masks for precise speaker designation in multi-character scenes. Our approach tackles key challenges with three innovations: (1) 3D full-attention mechanisms with a sliding window denoising strategy, enabling infinite video generation with temporal coherence and visual quality across diverse character styles; (2) a two-stage curriculum learning scheme, integrating audio for lip sync, text for expressive dynamics, and reference images for identity preservation, enabling flexible multi-modal control over long sequences; and (3) region-specific masks with adaptive loss functions to balance global textual control and local audio guidance, supporting speaker-specific animations. Efficiency is enhanced via our innovative unified step and cfg distillation techniques, achieving a 20x inference speed boost over the basemodel: generating a 10 second 540x540p video in 10 seconds or 720x720p in 30 seconds on 8 H100 GPUs, without quality loss. Evaluations on our new benchmark demonstrate MagicInfinite's superiority in audio-lip synchronization, identity preservation, and motion naturalness across diverse scenarios. It is publicly available atthis https URL, with examples atthis https URL.

View on arXiv
@article{yi2025_2503.05978,
  title={ MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice },
  author={ Hongwei Yi and Tian Ye and Shitong Shao and Xuancheng Yang and Jiantong Zhao and Hanzhong Guo and Terrance Wang and Qingyu Yin and Zeke Xie and Lei Zhu and Wei Li and Michael Lingelbach and Daquan Zhou },
  journal={arXiv preprint arXiv:2503.05978},
  year={ 2025 }
}
Comments on this paper