ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.05902
  4. Cited By
Autoregressive Models in Vision: A Survey
v1v2 (latest)

Autoregressive Models in Vision: A Survey

8 November 2024
Jing Xiong
Gongye Liu
Lun Huang
Chengyue Wu
Taiqiang Wu
Yao Mu
Yuan Yao
Jikang Cheng
Zhongwei Wan
Jinfa Huang
Chaofan Tao
Shen Yan
Huaxiu Yao
Lingpeng Kong
Hongxia Yang
Hao Fei
Guillermo Sapiro
Jiebo Luo
Ping Luo
Ngai Wong
    VGen
ArXiv (abs)PDFHTMLGithub (625★)

Papers citing "Autoregressive Models in Vision: A Survey"

50 / 211 papers shown
Title
CogView: Mastering Text-to-Image Generation via Transformers
CogView: Mastering Text-to-Image Generation via Transformers
Ming Ding
Zhuoyi Yang
Wenyi Hong
Wendi Zheng
Chang Zhou
...
Junyang Lin
Xu Zou
Zhou Shao
Hongxia Yang
Jie Tang
ViTVLM
125
782
0
26 May 2021
Diffusion Models Beat GANs on Image Synthesis
Diffusion Models Beat GANs on Image Synthesis
Prafulla Dhariwal
Alex Nichol
263
7,938
0
11 May 2021
HuMoR: 3D Human Motion Model for Robust Pose Estimation
HuMoR: 3D Human Motion Model for Robust Pose Estimation
Davis Rempe
Tolga Birdal
Aaron Hertzmann
Jimei Yang
Srinath Sridhar
Leonidas Guibas
3DH
95
316
0
10 May 2021
GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions
GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions
Chenfei Wu
Lun Huang
Qianxi Zhang
Binyang Li
Lei Ji
Fan Yang
Guillermo Sapiro
Nan Duan
DiffMVGen
80
242
0
30 Apr 2021
Diverse Image Inpainting with Bidirectional and Autoregressive
  Transformers
Diverse Image Inpainting with Bidirectional and Autoregressive Transformers
Yingchen Yu
Fangneng Zhan
Rongliang Wu
Jianxiong Pan
Kaiwen Cui
Shijian Lu
Feiying Ma
Xuansong Xie
Chunyan Miao
ViT
87
152
0
26 Apr 2021
VideoGPT: Video Generation using VQ-VAE and Transformers
VideoGPT: Video Generation using VQ-VAE and Transformers
Wilson Yan
Yunzhi Zhang
Pieter Abbeel
A. Srinivas
ViTVGen
310
512
0
20 Apr 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
284
2,521
0
20 Apr 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
463
21,564
0
25 Mar 2021
Generating Images with Sparse Representations
Generating Images with Sparse Representations
C. Nash
Jacob Menick
Sander Dieleman
Peter W. Battaglia
81
211
0
05 Mar 2021
Predicting Video with VQVAE
Predicting Video with VQVAE
Jacob Walker
Ali Razavi
Aaron van den Oord
DRL
101
69
0
02 Mar 2021
M6: A Chinese Multimodal Pretrainer
M6: A Chinese Multimodal Pretrainer
Junyang Lin
Rui Men
An Yang
Chan Zhou
Ming Ding
...
Yong Li
Wei Lin
Jingren Zhou
J. Tang
Hongxia Yang
VLMMoE
91
134
0
01 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIPVLM
967
29,810
0
26 Feb 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
418
4,996
0
24 Feb 2021
GAN Inversion: A Survey
GAN Inversion: A Survey
Weihao Xia
Yulun Zhang
Yujiu Yang
Jing-Hao Xue
Bolei Zhou
Ming-Hsuan Yang
DiffM
180
516
0
14 Jan 2021
Towards Faster and Stabilized GAN Training for High-fidelity Few-shot
  Image Synthesis
Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis
Bingchen Liu
Yizhe Zhu
Kunpeng Song
Ahmed Elgammal
229
239
0
12 Jan 2021
Taming Transformers for High-Resolution Image Synthesis
Taming Transformers for High-Resolution Image Synthesis
Patrick Esser
Robin Rombach
Bjorn Ommer
ViT
131
2,999
0
17 Dec 2020
Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them
  on Images
Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images
R. Child
BDLVLM
184
352
0
20 Nov 2020
Scaling Laws for Autoregressive Generative Modeling
Scaling Laws for Autoregressive Generative Modeling
T. Henighan
Jared Kaplan
Mor Katz
Mark Chen
Christopher Hesse
...
Nick Ryder
Daniel M. Ziegler
John Schulman
Dario Amodei
Sam McCandlish
107
429
0
28 Oct 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
670
41,430
0
22 Oct 2020
Mastering Atari with Discrete World Models
Mastering Atari with Discrete World Models
Danijar Hafner
Timothy Lillicrap
Mohammad Norouzi
Jimmy Ba
DRL
117
869
0
05 Oct 2020
Incorporating Reinforced Adversarial Learning in Autoregressive Image
  Generation
Incorporating Reinforced Adversarial Learning in Autoregressive Image Generation
Kenan E. Ak
N. Xu
Zhe Lin
Yilin Wang
59
13
0
20 Jul 2020
Generating Annotated High-Fidelity Images Containing Multiple Coherent
  Objects
Generating Annotated High-Fidelity Images Containing Multiple Coherent Objects
Bryan G. Cardenas
Devanshu Arya
D. K. Gupta
DiffM
76
6
0
22 Jun 2020
Denoising Diffusion Probabilistic Models
Denoising Diffusion Probabilistic Models
Jonathan Ho
Ajay Jain
Pieter Abbeel
DiffM
677
18,310
0
19 Jun 2020
Latent Video Transformer
Latent Video Transformer
Ruslan Rakhimov
Denis Volkhonskiy
Alexey Artemov
Denis Zorin
Evgeny Burnaev
VGen
100
121
0
18 Jun 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
859
42,379
0
28 May 2020
RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
Zachary Teed
Jia Deng
MDE
247
2,644
0
26 Mar 2020
Decision-Making with Auto-Encoding Variational Bayes
Decision-Making with Auto-Encoding Variational Bayes
Romain Lopez
Pierre Boyeau
Nir Yosef
Michael I. Jordan
Jeffrey Regier
BDL
491
10,591
0
17 Feb 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
611
4,905
0
23 Jan 2020
Dream to Control: Learning Behaviors by Latent Imagination
Dream to Control: Learning Behaviors by Latent Imagination
Danijar Hafner
Timothy Lillicrap
Jimmy Ba
Mohammad Norouzi
VLM
126
1,371
0
03 Dec 2019
Scaling Autoregressive Video Models
Scaling Autoregressive Video Models
Dirk Weissenborn
Oscar Täckström
Jakob Uszkoreit
DiffMVGen
94
204
0
06 Jun 2019
Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2
Ali Razavi
Aaron van den Oord
Oriol Vinyals
DRLBDL
147
1,827
0
02 Jun 2019
Improved Precision and Recall Metric for Assessing Generative Models
Improved Precision and Recall Metric for Assessing Generative Models
Tuomas Kynkaanniemi
Tero Karras
S. Laine
J. Lehtinen
Timo Aila
EGVM
105
865
0
15 Apr 2019
Model-Based Reinforcement Learning for Atari
Model-Based Reinforcement Learning for Atari
Lukasz Kaiser
Mohammad Babaeizadeh
Piotr Milos
B. Osinski
R. Campbell
...
Sergey Levine
Afroz Mohiuddin
Ryan Sepassi
George Tucker
Henryk Michalewski
OffRL
138
867
0
01 Mar 2019
Towards Accurate Generative Models of Video: A New Metric & Challenges
Towards Accurate Generative Models of Video: A New Metric & Challenges
Thomas Unterthiner
Sjoerd van Steenkiste
Karol Kurach
Raphaël Marinier
Marcin Michalski
Sylvain Gelly
EGVMVGen
91
745
0
03 Dec 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
1.8K
95,175
0
11 Oct 2018
Large Scale GAN Training for High Fidelity Natural Image Synthesis
Large Scale GAN Training for High Fidelity Natural Image Synthesis
Andrew Brock
Jeff Donahue
Karen Simonyan
269
5,403
0
28 Sep 2018
Learning Blind Video Temporal Consistency
Learning Blind Video Temporal Consistency
Wei-Sheng Lai
Jia-Bin Huang
Oliver Wang
Eli Shechtman
Ersin Yumer
Ming-Hsuan Yang
93
368
0
01 Aug 2018
Image Transformer
Image Transformer
Niki Parmar
Ashish Vaswani
Jakob Uszkoreit
Lukasz Kaiser
Noam M. Shazeer
Alexander Ku
Dustin Tran
ViT
141
1,684
0
15 Feb 2018
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
Richard Y. Zhang
Phillip Isola
Alexei A. Efros
Eli Shechtman
Oliver Wang
EGVM
384
11,905
0
11 Jan 2018
Demystifying MMD GANs
Demystifying MMD GANs
Mikolaj Binkowski
Danica J. Sutherland
Michael Arbel
Arthur Gretton
EGVM
171
1,500
0
04 Jan 2018
PixelSNAIL: An Improved Autoregressive Generative Model
PixelSNAIL: An Improved Autoregressive Generative Model
Xi Chen
Nikhil Mishra
Mostafa Rohaninejad
Pieter Abbeel
DRLDiffMBDLGAN
78
276
0
28 Dec 2017
Neural Discrete Representation Learning
Neural Discrete Representation Learning
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
BDLSSLOCL
230
5,071
0
02 Nov 2017
Stochastic Variational Video Prediction
Stochastic Variational Video Prediction
Mohammad Babaeizadeh
Chelsea Finn
D. Erhan
R. Campbell
Sergey Levine
DRLVGen
84
543
0
30 Oct 2017
MoCoGAN: Decomposing Motion and Content for Video Generation
MoCoGAN: Decomposing Motion and Content for Video Generation
Sergey Tulyakov
Ming-Yuan Liu
Xiaodong Yang
Jan Kautz
GAN
144
1,149
0
17 Jul 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
730
132,363
0
12 Jun 2017
Parallel Multiscale Autoregressive Density Estimation
Parallel Multiscale Autoregressive Density Estimation
Scott E. Reed
Aaron van den Oord
Nal Kalchbrenner
Sergio Gomez Colmenarejo
Ziyun Wang
Dan Belov
Nando de Freitas
BDL
92
207
0
10 Mar 2017
PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture
  Likelihood and Other Modifications
PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications
Tim Salimans
A. Karpathy
Xi Chen
Diederik P. Kingma
107
944
0
19 Jan 2017
Mode Regularized Generative Adversarial Networks
Mode Regularized Generative Adversarial Networks
Tong Che
Yanran Li
Athul Paul Jacob
Yoshua Bengio
Wenjie Li
GAN
128
556
0
07 Dec 2016
PixelVAE: A Latent Variable Model for Natural Images
PixelVAE: A Latent Variable Model for Natural Images
Ishaan Gulrajani
Kundan Kumar
Faruk Ahmed
Adrien Ali Taïga
Francesco Visin
David Vazquez
Aaron Courville
DRLSSLBDL
80
340
0
15 Nov 2016
Variational Lossy Autoencoder
Variational Lossy Autoencoder
Xi Chen
Diederik P. Kingma
Tim Salimans
Yan Duan
Prafulla Dhariwal
John Schulman
Ilya Sutskever
Pieter Abbeel
DRLSSLGAN
152
676
0
08 Nov 2016
Previous
12345
Next