Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2411.05902
Cited By
v1
v2 (latest)
Autoregressive Models in Vision: A Survey
8 November 2024
Jing Xiong
Gongye Liu
Lun Huang
Chengyue Wu
Taiqiang Wu
Yao Mu
Yuan Yao
Jikang Cheng
Zhongwei Wan
Jinfa Huang
Chaofan Tao
Shen Yan
Huaxiu Yao
Lingpeng Kong
Hongxia Yang
Hao Fei
Guillermo Sapiro
Jiebo Luo
Ping Luo
Ngai Wong
VGen
Re-assign community
ArXiv (abs)
PDF
HTML
Github (625★)
Papers citing
"Autoregressive Models in Vision: A Survey"
50 / 211 papers shown
Title
CogView: Mastering Text-to-Image Generation via Transformers
Ming Ding
Zhuoyi Yang
Wenyi Hong
Wendi Zheng
Chang Zhou
...
Junyang Lin
Xu Zou
Zhou Shao
Hongxia Yang
Jie Tang
ViT
VLM
125
782
0
26 May 2021
Diffusion Models Beat GANs on Image Synthesis
Prafulla Dhariwal
Alex Nichol
263
7,938
0
11 May 2021
HuMoR: 3D Human Motion Model for Robust Pose Estimation
Davis Rempe
Tolga Birdal
Aaron Hertzmann
Jimei Yang
Srinath Sridhar
Leonidas Guibas
3DH
95
316
0
10 May 2021
GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions
Chenfei Wu
Lun Huang
Qianxi Zhang
Binyang Li
Lei Ji
Fan Yang
Guillermo Sapiro
Nan Duan
DiffM
VGen
80
242
0
30 Apr 2021
Diverse Image Inpainting with Bidirectional and Autoregressive Transformers
Yingchen Yu
Fangneng Zhan
Rongliang Wu
Jianxiong Pan
Kaiwen Cui
Shijian Lu
Feiying Ma
Xuansong Xie
Chunyan Miao
ViT
87
152
0
26 Apr 2021
VideoGPT: Video Generation using VQ-VAE and Transformers
Wilson Yan
Yunzhi Zhang
Pieter Abbeel
A. Srinivas
ViT
VGen
310
512
0
20 Apr 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
284
2,521
0
20 Apr 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
463
21,564
0
25 Mar 2021
Generating Images with Sparse Representations
C. Nash
Jacob Menick
Sander Dieleman
Peter W. Battaglia
81
211
0
05 Mar 2021
Predicting Video with VQVAE
Jacob Walker
Ali Razavi
Aaron van den Oord
DRL
101
69
0
02 Mar 2021
M6: A Chinese Multimodal Pretrainer
Junyang Lin
Rui Men
An Yang
Chan Zhou
Ming Ding
...
Yong Li
Wei Lin
Jingren Zhou
J. Tang
Hongxia Yang
VLM
MoE
91
134
0
01 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
967
29,810
0
26 Feb 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
418
4,996
0
24 Feb 2021
GAN Inversion: A Survey
Weihao Xia
Yulun Zhang
Yujiu Yang
Jing-Hao Xue
Bolei Zhou
Ming-Hsuan Yang
DiffM
180
516
0
14 Jan 2021
Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis
Bingchen Liu
Yizhe Zhu
Kunpeng Song
Ahmed Elgammal
229
239
0
12 Jan 2021
Taming Transformers for High-Resolution Image Synthesis
Patrick Esser
Robin Rombach
Bjorn Ommer
ViT
131
2,999
0
17 Dec 2020
Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images
R. Child
BDL
VLM
184
352
0
20 Nov 2020
Scaling Laws for Autoregressive Generative Modeling
T. Henighan
Jared Kaplan
Mor Katz
Mark Chen
Christopher Hesse
...
Nick Ryder
Daniel M. Ziegler
John Schulman
Dario Amodei
Sam McCandlish
107
429
0
28 Oct 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
670
41,430
0
22 Oct 2020
Mastering Atari with Discrete World Models
Danijar Hafner
Timothy Lillicrap
Mohammad Norouzi
Jimmy Ba
DRL
117
869
0
05 Oct 2020
Incorporating Reinforced Adversarial Learning in Autoregressive Image Generation
Kenan E. Ak
N. Xu
Zhe Lin
Yilin Wang
59
13
0
20 Jul 2020
Generating Annotated High-Fidelity Images Containing Multiple Coherent Objects
Bryan G. Cardenas
Devanshu Arya
D. K. Gupta
DiffM
76
6
0
22 Jun 2020
Denoising Diffusion Probabilistic Models
Jonathan Ho
Ajay Jain
Pieter Abbeel
DiffM
677
18,310
0
19 Jun 2020
Latent Video Transformer
Ruslan Rakhimov
Denis Volkhonskiy
Alexey Artemov
Denis Zorin
Evgeny Burnaev
VGen
100
121
0
18 Jun 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
859
42,379
0
28 May 2020
RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
Zachary Teed
Jia Deng
MDE
247
2,644
0
26 Mar 2020
Decision-Making with Auto-Encoding Variational Bayes
Romain Lopez
Pierre Boyeau
Nir Yosef
Michael I. Jordan
Jeffrey Regier
BDL
491
10,591
0
17 Feb 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
611
4,905
0
23 Jan 2020
Dream to Control: Learning Behaviors by Latent Imagination
Danijar Hafner
Timothy Lillicrap
Jimmy Ba
Mohammad Norouzi
VLM
126
1,371
0
03 Dec 2019
Scaling Autoregressive Video Models
Dirk Weissenborn
Oscar Täckström
Jakob Uszkoreit
DiffM
VGen
94
204
0
06 Jun 2019
Generating Diverse High-Fidelity Images with VQ-VAE-2
Ali Razavi
Aaron van den Oord
Oriol Vinyals
DRL
BDL
147
1,827
0
02 Jun 2019
Improved Precision and Recall Metric for Assessing Generative Models
Tuomas Kynkaanniemi
Tero Karras
S. Laine
J. Lehtinen
Timo Aila
EGVM
105
865
0
15 Apr 2019
Model-Based Reinforcement Learning for Atari
Lukasz Kaiser
Mohammad Babaeizadeh
Piotr Milos
B. Osinski
R. Campbell
...
Sergey Levine
Afroz Mohiuddin
Ryan Sepassi
George Tucker
Henryk Michalewski
OffRL
138
867
0
01 Mar 2019
Towards Accurate Generative Models of Video: A New Metric & Challenges
Thomas Unterthiner
Sjoerd van Steenkiste
Karol Kurach
Raphaël Marinier
Marcin Michalski
Sylvain Gelly
EGVM
VGen
91
745
0
03 Dec 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
95,175
0
11 Oct 2018
Large Scale GAN Training for High Fidelity Natural Image Synthesis
Andrew Brock
Jeff Donahue
Karen Simonyan
269
5,403
0
28 Sep 2018
Learning Blind Video Temporal Consistency
Wei-Sheng Lai
Jia-Bin Huang
Oliver Wang
Eli Shechtman
Ersin Yumer
Ming-Hsuan Yang
93
368
0
01 Aug 2018
Image Transformer
Niki Parmar
Ashish Vaswani
Jakob Uszkoreit
Lukasz Kaiser
Noam M. Shazeer
Alexander Ku
Dustin Tran
ViT
141
1,684
0
15 Feb 2018
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
Richard Y. Zhang
Phillip Isola
Alexei A. Efros
Eli Shechtman
Oliver Wang
EGVM
384
11,905
0
11 Jan 2018
Demystifying MMD GANs
Mikolaj Binkowski
Danica J. Sutherland
Michael Arbel
Arthur Gretton
EGVM
171
1,500
0
04 Jan 2018
PixelSNAIL: An Improved Autoregressive Generative Model
Xi Chen
Nikhil Mishra
Mostafa Rohaninejad
Pieter Abbeel
DRL
DiffM
BDL
GAN
78
276
0
28 Dec 2017
Neural Discrete Representation Learning
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
BDL
SSL
OCL
230
5,071
0
02 Nov 2017
Stochastic Variational Video Prediction
Mohammad Babaeizadeh
Chelsea Finn
D. Erhan
R. Campbell
Sergey Levine
DRL
VGen
84
543
0
30 Oct 2017
MoCoGAN: Decomposing Motion and Content for Video Generation
Sergey Tulyakov
Ming-Yuan Liu
Xiaodong Yang
Jan Kautz
GAN
144
1,149
0
17 Jul 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
730
132,363
0
12 Jun 2017
Parallel Multiscale Autoregressive Density Estimation
Scott E. Reed
Aaron van den Oord
Nal Kalchbrenner
Sergio Gomez Colmenarejo
Ziyun Wang
Dan Belov
Nando de Freitas
BDL
92
207
0
10 Mar 2017
PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications
Tim Salimans
A. Karpathy
Xi Chen
Diederik P. Kingma
107
944
0
19 Jan 2017
Mode Regularized Generative Adversarial Networks
Tong Che
Yanran Li
Athul Paul Jacob
Yoshua Bengio
Wenjie Li
GAN
128
556
0
07 Dec 2016
PixelVAE: A Latent Variable Model for Natural Images
Ishaan Gulrajani
Kundan Kumar
Faruk Ahmed
Adrien Ali Taïga
Francesco Visin
David Vazquez
Aaron Courville
DRL
SSL
BDL
80
340
0
15 Nov 2016
Variational Lossy Autoencoder
Xi Chen
Diederik P. Kingma
Tim Salimans
Yan Duan
Prafulla Dhariwal
John Schulman
Ilya Sutskever
Pieter Abbeel
DRL
SSL
GAN
152
676
0
08 Nov 2016
Previous
1
2
3
4
5
Next