10
0

Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation

Zhuoyang Zhang
Luke J. Huang
Chengyue Wu
Shang Yang
Kelly Peng
Yao Lu
Song Han
Main:11 Pages
13 Figures
Bibliography:5 Pages
4 Tables
Appendix:4 Pages
Abstract

We present Locality-aware Parallel Decoding (LPD) to accelerate autoregressive image generation. Traditional autoregressive image generation relies on next-patch prediction, a memory-bound process that leads to high latency. Existing works have tried to parallelize next-patch prediction by shifting to multi-patch prediction to accelerate the process, but only achieved limited parallelization. To achieve high parallelization while maintaining generation quality, we introduce two key techniques: (1) Flexible Parallelized Autoregressive Modeling, a novel architecture that enables arbitrary generation ordering and degrees of parallelization. It uses learnable position query tokens to guide generation at target positions while ensuring mutual visibility among concurrently generated tokens for consistent parallel decoding. (2) Locality-aware Generation Ordering, a novel schedule that forms groups to minimize intra-group dependencies and maximize contextual support, enhancing generation quality. With these designs, we reduce the generation steps from 256 to 20 (256×\times256 res.) and 1024 to 48 (512×\times512 res.) without compromising quality on the ImageNet class-conditional generation, and achieving at least 3.4×\times lower latency than previous parallelized autoregressive models.

View on arXiv
@article{zhang2025_2507.01957,
  title={ Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation },
  author={ Zhuoyang Zhang and Luke J. Huang and Chengyue Wu and Shang Yang and Kelly Peng and Yao Lu and Song Han },
  journal={arXiv preprint arXiv:2507.01957},
  year={ 2025 }
}
Comments on this paper