Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2411.05902
Cited By
v1
v2 (latest)
Autoregressive Models in Vision: A Survey
8 November 2024
Jing Xiong
Gongye Liu
Lun Huang
Chengyue Wu
Taiqiang Wu
Yao Mu
Yuan Yao
Jikang Cheng
Zhongwei Wan
Jinfa Huang
Chaofan Tao
Shen Yan
Huaxiu Yao
Lingpeng Kong
Hongxia Yang
Hao Fei
Guillermo Sapiro
Jiebo Luo
Ping Luo
Ngai Wong
VGen
Re-assign community
ArXiv (abs)
PDF
HTML
Github (625★)
Papers citing
"Autoregressive Models in Vision: A Survey"
50 / 211 papers shown
Title
Mastering Diverse Domains through World Models
Danijar Hafner
J. Pašukonis
Jimmy Ba
Timothy Lillicrap
72
612
0
10 Jan 2023
Scaling Laws for Generative Mixed-Modal Language Models
Armen Aghajanyan
L. Yu
Alexis Conneau
Wei-Ning Hsu
Karen Hambardzumyan
Susan Zhang
Stephen Roller
Naman Goyal
Omer Levy
Luke Zettlemoyer
MoE
VLM
80
108
0
10 Jan 2023
Muse: Text-To-Image Generation via Masked Generative Transformers
Huiwen Chang
Han Zhang
Jarred Barber
AJ Maschinot
José Lezama
...
Kevin Patrick Murphy
William T. Freeman
Michael Rubinstein
Yuanzhen Li
Dilip Krishnan
DiffM
266
556
0
02 Jan 2023
Scalable Diffusion Models with Transformers
William S. Peebles
Saining Xie
GNN
118
2,418
0
19 Dec 2022
MAGVIT: Masked Generative Video Transformer
Lijun Yu
Yong Cheng
Kihyuk Sohn
José Lezama
Han Zhang
...
Alexander G. Hauptmann
Ming-Hsuan Yang
Yuan Hao
Irfan Essa
Lu Jiang
DiffM
VGen
77
248
0
10 Dec 2022
Exploring Stochastic Autoregressive Image Modeling for Visual Representation
Yu-Hang Qi
Fan Yang
Yousong Zhu
Yufei Liu
Liwei Wu
Rui Zhao
Wei Li
DiffM
36
13
0
03 Dec 2022
Make-A-Story: Visual Memory Conditioned Consistent Story Generation
Tanzila Rahman
Hsin-Ying Lee
Jian Ren
Sergey Tulyakov
Shweta Mahajan
Leonid Sigal
DiffM
97
70
0
23 Nov 2022
Retrieval-Augmented Multimodal Language Modeling
Michihiro Yasunaga
Armen Aghajanyan
Weijia Shi
Rich James
J. Leskovec
Percy Liang
M. Lewis
Luke Zettlemoyer
Wen-tau Yih
RALM
61
108
0
22 Nov 2022
MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis
Tianhong Li
Huiwen Chang
Shlok Kumar Mishra
Han Zhang
Dina Katabi
Dilip Krishnan
72
169
0
16 Nov 2022
Phenaki: Variable Length Video Generation From Open Domain Textual Description
Ruben Villegas
Mohammad Babaeizadeh
Pieter-Jan Kindermans
Hernan Moraldo
Han Zhang
M. Saffar
Santiago Castro
Julius Kunze
D. Erhan
DiffM
VGen
133
395
0
05 Oct 2022
MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation
Chuanxia Zheng
L. Vuong
Jianfei Cai
Dinh Q. Phung
MQ
132
79
0
19 Sep 2022
HARP: Autoregressive Latent Video Prediction with High-Fidelity Image Generator
Younggyo Seo
Kimin Lee
Fangchen Liu
Stephen James
Pieter Abbeel
VGen
59
29
0
15 Sep 2022
Morphology-preserving Autoregressive 3D Generative Modelling of the Brain
Petru-Daniel Tudosiu
W. H. Pinaya
M. Graham
Pedro Borges
Virginia Fernandez
...
Disha Mehra
M. Vella
P. Nachev
Sebastien Ourselin
M. Jorge Cardoso
3DH
DiffM
MedIm
49
21
0
07 Sep 2022
Visual Prompting via Image Inpainting
Amir Bar
Yossi Gandelsman
Trevor Darrell
Amir Globerson
Alexei A. Efros
VLM
VPVLM
60
211
0
01 Sep 2022
Transformers are Sample-Efficient World Models
Vincent Micheli
Eloi Alonso
Franccois Fleuret
VLM
OffRL
147
187
0
01 Sep 2022
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Wenhui Wang
Hangbo Bao
Li Dong
Johan Bjorck
Zhiliang Peng
...
Kriti Aggarwal
O. Mohammed
Saksham Singhal
Subhojit Som
Furu Wei
MLLM
VLM
ViT
148
644
0
22 Aug 2022
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
Zhiliang Peng
Li Dong
Hangbo Bao
QiXiang Ye
Furu Wei
66
319
0
12 Aug 2022
Classifier-Free Diffusion Guidance
Jonathan Ho
Tim Salimans
FaML
196
3,963
0
26 Jul 2022
NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis
Chenfei Wu
Jian Liang
Xiaowei Hu
Zhe Gan
Jianfeng Wang
Lijuan Wang
Zicheng Liu
Yuejian Fang
Nan Duan
VGen
75
74
0
20 Jul 2022
Outpainting by Queries
Kai Yao
Penglei Gao
Xi Yang
Kaizhu Huang
Jie Sun
Rui Zhang
ViT
76
13
0
12 Jul 2022
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
...
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
EGVM
192
1,129
0
22 Jun 2022
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
ObjD
VLM
MLLM
141
411
0
17 Jun 2022
VL-BEiT: Generative Vision-Language Pretraining
Hangbo Bao
Wenhui Wang
Li Dong
Furu Wei
VLM
72
45
0
02 Jun 2022
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
Wenyi Hong
Ming Ding
Wendi Zheng
Xinghan Liu
Jie Tang
DiffM
314
629
0
29 May 2022
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Chitwan Saharia
William Chan
Saurabh Saxena
Lala Li
Jay Whang
...
Raphael Gontijo-Lopes
Tim Salimans
Jonathan Ho
David J Fleet
Mohammad Norouzi
VLM
461
6,067
0
23 May 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
418
3,602
0
29 Apr 2022
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers
Ming Ding
Wendi Zheng
Wenyi Hong
Jie Tang
VLM
109
334
0
28 Apr 2022
VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance
Katherine Crowson
Stella Biderman
Daniel Kornis
Dashiell Stander
Eric Hallahan
Louis Castricato
Edward Raff
CLIP
130
380
0
18 Apr 2022
Unconditional Image-Text Pair Generation with Multimodal Cross Quantizer
Hyungyu Lee
Sungjin Park
Joonseok Lee
Edward Choi
42
2
0
15 Apr 2022
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya A. Ramesh
Prafulla Dhariwal
Alex Nichol
Casey Chu
Mark Chen
VLM
DiffM
413
6,908
0
13 Apr 2022
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
Songwei Ge
Thomas Hayes
Harry Yang
Xiaoyue Yin
Guan Pang
David Jacobs
Jia-Bin Huang
Devi Parikh
ViT
125
223
0
07 Apr 2022
Autoregressive 3D Shape Generation via Canonical Mapping
A. Cheng
Xueting Li
Sifei Liu
Min Sun
Ming-Hsuan Yang
3DPC
80
41
0
05 Apr 2022
HiT-DVAE: Human Motion Generation via Hierarchical Transformer Dynamical VAE
Xiaoyu Bie
Wen Guo
Simon Leglaive
Laurent Girin
Francesc Moreno-Noguer
Xavier Alameda-Pineda
VGen
3DH
88
13
0
04 Apr 2022
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
208
1,976
0
29 Mar 2022
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
Oran Gafni
Adam Polyak
Oron Ashual
Shelly Sheynin
Devi Parikh
Yaniv Taigman
DiffM
81
524
0
24 Mar 2022
Temporal Difference Learning for Model Predictive Control
Nicklas Hansen
Xiaolong Wang
H. Su
PINN
MU
93
253
0
09 Mar 2022
Autoregressive Image Generation using Residual Quantization
Doyup Lee
Chiheon Kim
Saehoon Kim
Minsu Cho
Wook-Shin Han
VGen
271
375
0
03 Mar 2022
MaskGIT: Masked Generative Image Transformer
Huiwen Chang
Han Zhang
Lu Jiang
Ce Liu
William T. Freeman
ViT
153
695
0
08 Feb 2022
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Peng Wang
An Yang
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
MLLM
ObjD
154
880
0
07 Feb 2022
StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets
Axel Sauer
Katja Schwarz
Andreas Geiger
268
512
0
01 Feb 2022
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation
Han Zhang
Weichong Yin
Yewei Fang
Lanxin Li
Boqiang Duan
Zhihua Wu
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
59
59
0
31 Dec 2021
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
482
15,734
0
20 Dec 2021
Make It Move: Controllable Image-to-Video Generation with Text Descriptions
Yaosi Hu
Chong Luo
Zhenzhong Chen
VGen
54
89
0
06 Dec 2021
Vector Quantized Diffusion Model for Text-to-Image Synthesis
Shuyang Gu
Dong Chen
Jianmin Bao
Fang Wen
Bo Zhang
Dongdong Chen
Lu Yuan
B. Guo
DiffM
148
798
0
29 Nov 2021
Octree Transformer: Autoregressive 3D Shape Generation on Hierarchically Structured Sequences
Moritz Ibing
Gregor Kobsik
Leif Kobbelt
84
37
0
24 Nov 2021
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Chenfei Wu
Jian Liang
Lei Ji
Fan Yang
Yuejian Fang
Daxin Jiang
Nan Duan
ViT
VGen
75
296
0
24 Nov 2021
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
470
7,819
0
11 Nov 2021
Vector-quantized Image Modeling with Improved VQGAN
Jiahui Yu
Xin Li
Jing Yu Koh
Han Zhang
Ruoming Pang
James Qin
Alexander Ku
Yuanzhong Xu
Jason Baldridge
Yonghui Wu
ViT
VLM
DRL
125
526
0
09 Oct 2021
ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis
Patrick Esser
Robin Rombach
A. Blattmann
Bjorn Ommer
DiffM
99
161
0
19 Aug 2021
BEiT: BERT Pre-Training of Image Transformers
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
ViT
289
2,841
0
15 Jun 2021
Previous
1
2
3
4
5
Next