Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1906.02634
Cited By
Scaling Autoregressive Video Models
6 June 2019
Dirk Weissenborn
Oscar Täckström
Jakob Uszkoreit
DiffM
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Autoregressive Video Models"
50 / 77 papers shown
Title
R
^R
R
FLAV: Rolling Flow matching for infinite Audio Video generation
Alex Ergasti
Giuseppe Tarollo
Filippo Botti
Tomaso Fontanini
Claudio Ferrari
Massimo Bertozzi
Andrea Prati
VGen
50
0
0
13 Mar 2025
Unified Video Action Model
Shuang Li
Yihuai Gao
Dorsa Sadigh
Shuran Song
VGen
56
2
0
28 Feb 2025
MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation
Sihyun Yu
Meera Hahn
Dan Kondratyuk
Jinwoo Shin
Agrim Gupta
José Lezama
Irfan Essa
David A. Ross
Jonathan Huang
DiffM
VGen
77
0
0
18 Feb 2025
A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation
Masato Ishii
Akio Hayakawa
Takashi Shibuya
Yuki Mitsufuji
VGen
DiffM
68
4
0
26 Sep 2024
Real-Time Video Generation with Pyramid Attention Broadcast
Xuanlei Zhao
Xiaolong Jin
Kai Wang
Yang You
VGen
DiffM
77
32
0
22 Aug 2024
Cognitively Inspired Energy-Based World Models
Alexi Gladstone
Ganesh Nanduru
Md. Mofijul Islam
Aman Chadha
Jundong Li
Tariq Iqbal
44
0
0
13 Jun 2024
SFTformer: A Spatial-Frequency-Temporal Correlation-Decoupling Transformer for Radar Echo Extrapolation
Liangyu Xu
Wanxuan Lu
Hongfeng Yu
Fanglong Yao
Xian Sun
Kun Fu
50
5
0
28 Feb 2024
CrisisViT: A Robust Vision Transformer for Crisis Image Classification
Zijun Long
R. McCreadie
Muhammad Imran
102
9
0
05 Jan 2024
Latte: Latent Diffusion Transformer for Video Generation
Xin Ma
Yaohui Wang
Gengyun Jia
Xinyuan Chen
Ziqiang Liu
Yuan-Fang Li
Cunjian Chen
Yu Qiao
DiffM
VGen
125
242
0
05 Jan 2024
MEVG: Multi-event Video Generation with Text-to-Video Models
Gyeongrok Oh
Jaehwan Jeong
Sieun Kim
Wonmin Byeon
Jinkyu Kim
Sungwoong Kim
Sangpil Kim
VGen
DiffM
35
21
0
07 Dec 2023
Breathing Life Into Sketches Using Text-to-Video Priors
Rinon Gal
Yael Vinker
Yuval Alaluf
Amit H. Bermano
Daniel Cohen-Or
Ariel Shamir
Gal Chechik
VGen
DiffM
40
29
0
21 Nov 2023
Matryoshka Diffusion Models
Jiatao Gu
Shuangfei Zhai
Yizhen Zhang
Joshua M. Susskind
Navdeep Jaitly
DiffM
21
43
0
23 Oct 2023
DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving
Xiaofeng Wang
Zheng Hua Zhu
Guan Huang
Xinze Chen
Jiagang Zhu
Jiwen Lu
VGen
27
149
0
18 Sep 2023
PreDiff: Precipitation Nowcasting with Latent Diffusion Models
Zhihan Gao
Xingjian Shi
Boran Han
Hongya Wang
Xiaoyong Jin
Danielle C. Maddix
Yi Zhu
Mu Li
Bernie Wang
BDL
DiffM
43
57
0
19 Jul 2023
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking
Chris Cundy
Stefano Ermon
16
10
0
08 Jun 2023
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
A. Blattmann
Robin Rombach
Huan Ling
Tim Dockhorn
Seung Wook Kim
Sanja Fidler
Karsten Kreis
3DGS
VGen
124
1,016
0
18 Apr 2023
Object-Centric Video Prediction via Decoupling of Object Dynamics and Interactions
Angel Villar-Corrales
Ismail Wahdan
Sven Behnke
OCL
29
7
0
23 Feb 2023
Long-horizon video prediction using a dynamic latent hierarchy
Alexey Zakharov
Qinghai Guo
Z. Fountas
31
4
0
29 Dec 2022
Towards Smooth Video Composition
Qihang Zhang
Ceyuan Yang
Yujun Shen
Yinghao Xu
Bolei Zhou
VGen
44
14
0
14 Dec 2022
MAGVIT: Masked Generative Video Transformer
Lijun Yu
Yong Cheng
Kihyuk Sohn
José Lezama
Han Zhang
...
Alexander G. Hauptmann
Ming-Hsuan Yang
Yuan Hao
Irfan Essa
Lu Jiang
DiffM
VGen
38
228
0
10 Dec 2022
MIMO Is All You Need : A Strong Multi-In-Multi-Out Baseline for Video Prediction
Shuliang Ning
Mengcheng Lan
Yanran Li
Chaofeng Chen
Qian Chen
Xunlai Chen
Xiaoguang Han
Shuguang Cui
41
20
0
09 Dec 2022
Latent Video Diffusion Models for High-Fidelity Long Video Generation
Yin-Yin He
Tianyu Yang
Yong Zhang
Ying Shan
Qifeng Chen
DiffM
VGen
24
204
0
23 Nov 2022
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
Tsu-jui Fu
Licheng Yu
Ning Zhang
Cheng-Yang Fu
Jong-Chyi Su
William Yang Wang
Sean Bell
VGen
61
37
0
23 Nov 2022
SSGVS: Semantic Scene Graph-to-Video Synthesis
Yuren Cong
Jinhui Yi
Bodo Rosenhahn
M. Yang
67
7
0
11 Nov 2022
Disentangling Content and Motion for Text-Based Neural Video Manipulation
Levent Karacan
Tolga Kerimouglu
.Ismail .Inan
Tolga Birdal
Erkut Erdem
Aykut Erdem
29
1
0
05 Nov 2022
PoseGPT: Quantization-based 3D Human Motion Generation and Forecasting
Thomas Lucas
Fabien Baradel
Philippe Weinzaepfel
Grégory Rogez
22
69
0
19 Oct 2022
Phenaki: Variable Length Video Generation From Open Domain Textual Description
Ruben Villegas
Mohammad Babaeizadeh
Pieter-Jan Kindermans
Hernan Moraldo
Han Zhang
M. Saffar
Santiago Castro
Julius Kunze
D. Erhan
DiffM
VGen
68
374
0
05 Oct 2022
UniColor: A Unified Framework for Multi-Modal Colorization with Transformer
Zhitong Huang
Nanxuan Zhao
Jing Liao
ViT
20
16
0
22 Sep 2022
EleGANt: Exquisite and Locally Editable GAN for Makeup Transfer
Chenyu Yang
W. He
Yingqing Xu
Yang Gao
DiffM
24
26
0
20 Jul 2022
Earthformer: Exploring Space-Time Transformers for Earth System Forecasting
Zhihan Gao
Xingjian Shi
Hao Wang
Yi Zhu
Yuyang Wang
Mu Li
Dit-Yan Yeung
AI4TS
42
150
0
12 Jul 2022
3D-Aware Video Generation
Sherwin Bahmani
Jeong Joon Park
Despoina Paschalidou
Hao Tang
Gordon Wetzstein
Leonidas J. Guibas
Luc Van Gool
Radu Timofte
40
20
0
29 Jun 2022
MaskViT: Masked Visual Pre-Training for Video Prediction
Agrim Gupta
Stephen Tian
Yunzhi Zhang
Jiajun Wu
Roberto Martín-Martín
Li Fei-Fei
112
112
0
23 Jun 2022
Diffusion Models for Video Prediction and Infilling
Tobias Hoppe
Arash Mehrjou
Stefan Bauer
Didrik Nielsen
Andrea Dittadi
DiffM
VGen
38
131
0
15 Jun 2022
SimVP: Simpler yet Better Video Prediction
Zhangyang Gao
Cheng Tan
Lirong Wu
Stan Z. Li
46
211
0
09 Jun 2022
Cascaded Video Generation for Videos In-the-Wild
Lluis Castrejon
Nicolas Ballas
Aaron Courville
VGen
37
0
0
01 Jun 2022
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
Wenyi Hong
Ming Ding
Wendi Zheng
Xinghan Liu
Jie Tang
DiffM
256
570
0
29 May 2022
MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation
Vikram S. Voleti
Alexia Jolicoeur-Martineau
Christopher Pal
DiffM
VGen
13
291
0
19 May 2022
Brainish: Formalizing A Multimodal Language for Intelligence and Consciousness
Paul Pu Liang
30
4
0
14 Apr 2022
Malceiver: Perceiver with Hierarchical and Multi-modal Features for Android Malware Detection
Niall McLaughlin
30
2
0
12 Apr 2022
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
Songwei Ge
Thomas Hayes
Harry Yang
Xiaoyue Yin
Guan Pang
David Jacobs
Jia-Bin Huang
Devi Parikh
ViT
56
215
0
07 Apr 2022
Video Diffusion Models
Jonathan Ho
Tim Salimans
Alexey A. Gritsenko
William Chan
Mohammad Norouzi
David J. Fleet
DiffM
VGen
70
1,523
0
07 Apr 2022
Reinforcement Learning with Action-Free Pre-Training from Videos
Younggyo Seo
Kimin Lee
Stephen James
Pieter Abbeel
SSL
OnRL
18
118
0
25 Mar 2022
Transframer: Arbitrary Frame Prediction with Generative Models
C. Nash
João Carreira
Jacob Walker
Iain Barr
Andrew Jaegle
Mateusz Malinowski
Peter W. Battaglia
ViT
27
37
0
17 Mar 2022
Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks
Sihyun Yu
Jihoon Tack
Sangwoo Mo
Hyunsu Kim
Junho Kim
Jung-Woo Ha
Jinwoo Shin
DiffM
VGen
40
199
0
21 Feb 2022
TransDreamer: Reinforcement Learning with Transformer World Models
Changgu Chen
Yi-Fu Wu
Jaesik Yoon
Sungjin Ahn
OffRL
32
90
0
19 Feb 2022
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
24
103
0
16 Jan 2022
Low-Rank Constraints for Fast Inference in Structured Models
Justin T. Chiu
Yuntian Deng
Alexander M. Rush
BDL
32
13
0
08 Jan 2022
StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2
Ivan Skorokhodov
Sergey Tulyakov
Mohamed Elhoseiny
VGen
40
279
0
29 Dec 2021
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Chenfei Wu
Jian Liang
Lei Ji
Fan Yang
Yuejian Fang
Daxin Jiang
Nan Duan
ViT
VGen
18
292
0
24 Nov 2021
VP-GO: a "light" action-conditioned visual prediction model
Anji Ma
Yoann Fleytoux
Jean-Baptiste Mouret
S. Ivaldi
17
0
0
26 Sep 2021
1
2
Next