23
1

Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis

Main:9 Pages
20 Figures
Bibliography:4 Pages
8 Tables
Appendix:10 Pages
Abstract

Video generative models can be regarded as world simulators due to their ability to capture dynamic, continuous changes inherent in real-world environments. These models integrate high-dimensional information across visual, temporal, spatial, and causal dimensions, enabling predictions of subjects in various status. A natural and valuable research direction is to explore whether a fully trained video generative model in high-dimensional space can effectively support lower-dimensional tasks such as controllable image generation. In this work, we propose a paradigm for video-to-image knowledge compression and task adaptation, termed \textit{Dimension-Reduction Attack} (\texttt{DRA-Ctrl}), which utilizes the strengths of video models, including long-range context modeling and flatten full-attention, to perform various generation tasks. Specially, to address the challenging gap between continuous video frames and discrete image generation, we introduce a mixup-based transition strategy that ensures smooth adaptation. Moreover, we redesign the attention structure with a tailored masking mechanism to better align text prompts with image-level control. Experiments across diverse image generation tasks, such as subject-driven and spatially conditioned generation, show that repurposed video models outperform those trained directly on images. These results highlight the untapped potential of large-scale video generators for broader visual applications. \texttt{DRA-Ctrl} provides new insights into reusing resource-intensive video models and lays foundation for future unified generative models across visual modalities. The project page isthis https URL.

View on arXiv
@article{cao2025_2505.23325,
  title={ Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis },
  author={ Hengyuan Cao and Yutong Feng and Biao Gong and Yijing Tian and Yunhong Lu and Chuang Liu and Bin Wang },
  journal={arXiv preprint arXiv:2505.23325},
  year={ 2025 }
}
Comments on this paper