5

Efficient Autoregressive Video Diffusion with Dummy Head

Hang Guo
Zhaoyang Jia
Jiahao Li
Bin Li
Yuanhao Cai
Jiangshan Wang
Yawei Li
Yan Lu
Main:7 Pages
23 Figures
Bibliography:3 Pages
10 Tables
Appendix:15 Pages
Abstract

The autoregressive video diffusion model has recently gained considerable research interest due to its causal modeling and iterative denoising. In this work, we identify that the multi-head self-attention in these models under-utilizes historical frames: approximately 25% heads attend almost exclusively to the current frame, and discarding their KV caches incurs only minor performance degradation. Building upon this, we propose Dummy Forcing, a simple yet effective method to control context accessibility across different heads. Specifically, the proposed heterogeneous memory allocation reduces head-wise context redundancy, accompanied by dynamic head programming to adaptively classify head types. Moreover, we develop a context packing technique to achieve more aggressive cache compression. Without additional training, our Dummy Forcing delivers up to 2.0x speedup over the baseline, supporting video generation at 24.3 FPS with less than 0.5% quality drop. Project page is available atthis https URL.

View on arXiv
Comments on this paper