ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.16400
253
2
v1v2v3 (latest)

ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos

20 March 2025
Haolin Yang
Feilong Tang
Ming Hu
Yulong Li
Junjie Guo
Yexin Liu
Zelin Peng
Junjun He
Junjun He
Zongyuan Ge
    DiffMVGen
ArXiv (abs)PDFHTML
Main:9 Pages
9 Figures
Bibliography:8 Pages
5 Tables
Appendix:11 Pages
Abstract

Video diffusion models (VDMs) facilitate the generation of high-quality videos, with current research predominantly concentrated on scaling efforts during training through improvements in data quality, computational resources, and model complexity. However, inference-time scaling has received less attention, with most approaches restricting models to a single generation attempt. Recent studies have uncovered the existence of "golden noises" that can enhance video quality during generation. Building on this, we find that guiding the scaling inference-time search of VDMs to identify better noise candidates not only evaluates the quality of the frames generated in the current step but also preserves the high-level object features by referencing the anchor frame from previous multi-chunks, thereby delivering long-term value. Our analysis reveals that diffusion models inherently possess flexible adjustments of computation by varying denoising steps, and even a one-step denoising approach, when guided by a reward signal, yields significant long-term benefits. Based on the observation, we proposeScalingNoise, a plug-and-play inference-time search strategy that identifies golden initial noises for the diffusion sampling process to improve global content consistency and visual diversity. Specifically, we perform one-step denoising to convert initial noises into a clip and subsequently evaluate its long-term value, leveraging a reward model anchored by previously generated content. Moreover, to preserve diversity, we sample candidates from a tilted noise distribution that up-weights promising noises. In this way, ScalingNoise significantly reduces noise-induced errors, ensuring more coherent and spatiotemporally consistent video generation. Extensive experiments on benchmark datasets demonstrate that the proposed ScalingNoise effectively improves long video generation.

View on arXiv
@article{yang2025_2503.16400,
  title={ ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos },
  author={ Haolin Yang and Feilong Tang and Ming Hu and Qingyu Yin and Yulong Li and Yexin Liu and Zelin Peng and Peng Gao and Junjun He and Zongyuan Ge and Imran Razzak },
  journal={arXiv preprint arXiv:2503.16400},
  year={ 2025 }
}
Comments on this paper