Neighboring Autoregressive Modeling for Efficient Visual Generation

12 March 2025

Papers citing "Neighboring Autoregressive Modeling for Efficient Visual Generation"

29 / 29 papers shown

Title
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities Wei Wei Jintao Guo Shanshan Zhao Minghao Fu Lunhao Duan Guo-Hua Wang Qing-Guo Chen Zhao Xu Weihua Luo Kaifu Zhang DiffM 293 1 0 05 May 2025
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling Xiaokang Chen Zhiyu Wu Xingchao Liu Zizheng Pan Wen Liu Zhenda Xie X. Yu Chong Ruan AI4TS 152 160 0 29 Jan 2025
Parallelized Autoregressive Visual Generation Yanjie Wang Shuhuai Ren Zhijie Lin Yujin Han Haoyuan Guo Zhenheng Yang Difan Zou Jiashi Feng Xihui Liu VGen 182 17 0 19 Dec 2024
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis J. N. Han Jinlai Liu Yi Jiang Bin Yan Yuqi Zhang Zehuan Yuan Bingyue Peng Xiaobing Liu 122 52 0 05 Dec 2024
LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior Hanyu Wang Saksham Suri Yixuan Ren Hao Chen Abhinav Shrivastava VGen 77 12 0 28 Oct 2024
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding Yao Teng Han Shi Xian Liu Xuefei Ning Guohao Dai Yu Wang Zhenguo Li Xihui Liu 118 16 0 02 Oct 2024
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining Dongyang Liu Shitian Zhao Le Zhuo Weifeng Lin Ping Luo Xinyue Li Qi Qin Yu Qiao Hongsheng Li Peng Gao MLLM 151 59 0 05 Aug 2024
Autoregressive Image Generation without Vector Quantization Tianhong Li Yonglong Tian He Li Mingyang Deng Kaiming He DiffM 138 238 0 17 Jun 2024
Chameleon: Mixed-Modal Early-Fusion Foundation Models Chameleon Team MLLM 201 339 0 16 May 2024
Scalable Diffusion Models with Transformers William S. Peebles Saining Xie GNN 122 2,436 0 19 Dec 2022
MAGVIT: Masked Generative Video Transformer Lijun Yu Yong Cheng Kihyuk Sohn José Lezama Han Zhang ... Alexander G. Hauptmann Ming-Hsuan Yang Yuan Hao Irfan Essa Lu Jiang DiffM VGen 82 248 0 10 Dec 2022
Scaling Instruction-Finetuned Language Models Hyung Won Chung Le Hou Shayne Longpre Barret Zoph Yi Tay ... Jacob Devlin Adam Roberts Denny Zhou Quoc V. Le Jason W. Wei ReLM LRM 234 3,158 0 20 Oct 2022
Make-A-Video: Text-to-Video Generation without Text-Video Data Uriel Singer Adam Polyak Thomas Hayes Xiaoyue Yin Jie An ... Oron Ashual Oran Gafni Devi Parikh Sonal Gupta Yaniv Taigman DiffM VGen 85 1,434 0 29 Sep 2022
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers Wenyi Hong Ming Ding Wendi Zheng Xinghan Liu Jie Tang DiffM 316 631 0 29 May 2022
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer Songwei Ge Thomas Hayes Harry Yang Xiaoyue Yin Guan Pang David Jacobs Jia-Bin Huang Devi Parikh ViT 135 223 0 07 Apr 2022
High-Resolution Image Synthesis with Latent Diffusion Models Robin Rombach A. Blattmann Dominik Lorenz Patrick Esser Bjorn Ommer 3DV 511 15,788 0 20 Dec 2021
Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding Zizhao Zhang Han Zhang Long Zhao Ting Chen Sercan O. Arik Tomas Pfister ViT 79 173 0 26 May 2021
Diffusion Models Beat GANs on Image Synthesis Prafulla Dhariwal Alex Nichol 308 7,958 0 11 May 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai ... Matthias Minderer G. Heigold Sylvain Gelly Jakob Uszkoreit N. Houlsby ViT 684 41,563 0 22 Oct 2020
End-to-End Object Detection with Transformers Nicolas Carion Francisco Massa Gabriel Synnaeve Nicolas Usunier Alexander Kirillov Sergey Zagoruyko ViT 3DV PINN 456 13,130 0 26 May 2020
Painting Outside the Box: Image Outpainting with GANs Mark Sabini Gili Rusak GAN 56 71 0 25 Aug 2018
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 808 132,725 0 12 Jun 2017
Mask R-CNN Kaiming He Georgia Gkioxari Piotr Dollár Ross B. Girshick ObjD 387 27,275 0 20 Mar 2017
Conditional Image Generation with PixelCNN Decoders Aaron van den Oord Nal Kalchbrenner Oriol Vinyals L. Espeholt Alex Graves Koray Kavukcuoglu VLM 235 2,519 0 16 Jun 2016
Fully Convolutional Networks for Semantic Segmentation Evan Shelhamer Jonathan Long Trevor Darrell VOS SSeg 755 37,927 0 20 May 2016
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 2.3K 194,641 0 10 Dec 2015
You Only Look Once: Unified, Real-Time Object Detection Joseph Redmon S. Divvala Ross B. Girshick Ali Farhadi ObjD 742 37,085 0 08 Jun 2015
Very Deep Convolutional Networks for Large-Scale Image Recognition Karen Simonyan Andrew Zisserman FAtt MDE 1.7K 100,575 0 04 Sep 2014
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild K. Soomro Amir Zamir M. Shah CLIP VGen 165 6,170 0 03 Dec 2012