ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.21545
29
0

Corruption-Aware Training of Latent Video Diffusion Models for Robust Text-to-Video Generation

24 May 2025
Chika Maduabuchi
Hao Chen
Yujin Han
Jindong Wang
    DiffM
    VGen
ArXivPDFHTML
Abstract

Latent Video Diffusion Models (LVDMs) achieve high-quality generation but are sensitive to imperfect conditioning, which causes semantic drift and temporal incoherence on noisy, web-scale video-text datasets. We introduce CAT-LVDM, the first corruption-aware training framework for LVDMs that improves robustness through structured, data-aligned noise injection. Our method includes Batch-Centered Noise Injection (BCNI), which perturbs embeddings along intra-batch semantic directions to preserve temporal consistency. BCNI is especially effective on caption-rich datasets like WebVid-2M, MSR-VTT, and MSVD. We also propose Spectrum-Aware Contextual Noise (SACN), which injects noise along dominant spectral directions to improve low-frequency smoothness, showing strong results on UCF-101. On average, BCNI reduces FVD by 31.9% across WebVid-2M, MSR-VTT, and MSVD, while SACN yields a 12.3% improvement on UCF-101. Ablation studies confirm the benefit of low-rank, data-aligned noise. Our theoretical analysis further explains how such perturbations tighten entropy, Wasserstein, score-drift, mixing-time, and generalization bounds. CAT-LVDM establishes a principled, scalable training approach for robust video diffusion under multimodal noise. Code and models:this https URL

View on arXiv
@article{maduabuchi2025_2505.21545,
  title={ Corruption-Aware Training of Latent Video Diffusion Models for Robust Text-to-Video Generation },
  author={ Chika Maduabuchi and Hao Chen and Yujin Han and Jindong Wang },
  journal={arXiv preprint arXiv:2505.21545},
  year={ 2025 }
}
Comments on this paper