A Survey on Video Diffusion Models

16 October 2023

Zuxuan Wu

Papers citing "A Survey on Video Diffusion Models"

32 / 132 papers shown

Title
Score-Based Generative Modeling through Stochastic Differential Equations Yang Song Jascha Narain Sohl-Dickstein Diederik P. Kingma Abhishek Kumar Stefano Ermon Ben Poole DiffM SyDa 294 6,409 0 26 Nov 2020
Improved Techniques for Training Score-Based Generative Models Yang Song Stefano Ermon DiffM 209 1,150 0 16 Jun 2020
End-to-End Object Detection with Transformers Nicolas Carion Francisco Massa Gabriel Synnaeve Nicolas Usunier Alexander Kirillov Sergey Zagoruyko ViT 3DV PINN 361 13,002 0 26 May 2020
First Order Motion Model for Image Animation Aliaksandr Siarohin Stéphane Lathuilière Sergey Tulyakov Elisa Ricci N. Sebe VGen DiffM 77 924 0 29 Feb 2020
Analyzing and Improving the Image Quality of StyleGAN Tero Karras S. Laine M. Aittala Janne Hellsten J. Lehtinen Timo Aila GAN 260 5,797 0 03 Dec 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Colin Raffel Noam M. Shazeer Adam Roberts Katherine Lee Sharan Narang Michael Matena Yanqi Zhou Wei Li Peter J. Liu AIMat 381 20,053 0 23 Oct 2019
Deep High-Resolution Representation Learning for Visual Recognition Jingdong Wang Ke Sun Tianheng Cheng Borui Jiang Chaorui Deng ... Yadong Mu Mingkui Tan Xinggang Wang Wenyu Liu Bin Xiao 381 3,602 0 20 Aug 2019
Generative Modeling by Estimating Gradients of the Data Distribution Yang Song Stefano Ermon SyDa DiffM 213 3,870 0 12 Jul 2019
Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer René Ranftl Katrin Lasinger David Hafner Konrad Schindler V. Koltun MDE 197 1,786 0 02 Jul 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips Antoine Miech Dimitri Zhukov Jean-Baptiste Alayrac Makarand Tapaswi Ivan Laptev Josef Sivic VGen 105 1,199 0 07 Jun 2019
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research Xin Eric Wang Jiawei Wu Junkun Chen Lei Li Yuan-fang Wang William Yang Wang 93 549 0 06 Apr 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Zihang Dai Zhilin Yang Yiming Yang J. Carbonell Quoc V. Le Ruslan Salakhutdinov VLM 202 3,724 0 09 Jan 2019
A Style-Based Generator Architecture for Generative Adversarial Networks Tero Karras S. Laine Timo Aila 532 10,540 0 12 Dec 2018
Train Sparsely, Generate Densely: Memory-efficient Unsupervised Training of High-resolution Temporal GAN Masaki Saito Shunta Saito Masanori Koyama Sosuke Kobayashi 71 146 0 22 Nov 2018
TSM: Temporal Shift Module for Efficient Video Understanding Ji Lin Chuang Gan Song Han 85 1,683 0 20 Nov 2018
How2: A Large-scale Dataset for Multimodal Language Understanding Ramon Sanabria Ozan Caglayan Shruti Palaskar Desmond Elliott Loïc Barrault Lucia Specia Florian Metze VGen MLLM 81 288 0 01 Nov 2018
Video-to-Video Synthesis Ting-Chun Wang Ming-Yuan Liu Jun-Yan Zhu Guilin Liu Andrew Tao Jan Kautz Bryan Catanzaro GAN VGen 93 988 0 20 Aug 2018
Real-world Anomaly Detection in Surveillance Videos Waqas Sultani Chen Chen M. Shah AI4TS 166 1,477 0 12 Jan 2018
CARLA: An Open Urban Driving Simulator Alexey Dosovitskiy G. Ros Felipe Codevilla Antonio M. López V. Koltun VLM 133 5,146 0 10 Nov 2017
Neural Discrete Representation Learning Aaron van den Oord Oriol Vinyals Koray Kavukcuoglu BDL SSL OCL 208 4,989 0 02 Nov 2017
Learning to Generate Time-Lapse Videos Using Multi-Stage Dynamic Generative Adversarial Networks Wei Xiong Wenhan Luo Lin Ma Wen Liu Jiebo Luo GAN 51 181 0 22 Sep 2017
Localizing Moments in Video with Natural Language Lisa Anne Hendricks Oliver Wang Eli Shechtman Josef Sivic Trevor Darrell Bryan C. Russell 110 946 0 04 Aug 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset João Carreira Andrew Zisserman 219 7,989 0 22 May 2017
Dense-Captioning Events in Videos Ranjay Krishna Kenji Hata F. Ren Li Fei-Fei Juan Carlos Niebles 134 1,242 0 02 May 2017
The 2017 DAVIS Challenge on Video Object Segmentation Jordi Pont-Tuset Federico Perazzi Sergi Caelles Pablo Arbeláez A. Sorkine-Hornung Luc Van Gool VGen VOS 78 1,205 0 03 Apr 2017
Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields Zhe Cao Tomas Simon S. Wei Yaser Sheikh 3DH 149 6,528 0 24 Nov 2016
The THUMOS Challenge on Action Recognition for Videos "in the Wild" Haroon Idrees Amir Zamir Yu-Gang Jiang Alexander N. Gorban Ivan Laptev Rahul Sukthankar M. Shah 76 775 0 21 Apr 2016
The Cityscapes Dataset for Semantic Urban Scene Understanding Marius Cordts Mohamed Omran Sebastian Ramos Timo Rehfeld Markus Enzweiler Rodrigo Benenson Uwe Franke Stefan Roth Bernt Schiele 1.0K 11,587 0 06 Apr 2016
Rethinking the Inception Architecture for Computer Vision Christian Szegedy Vincent Vanhoucke Sergey Ioffe Jonathon Shlens Z. Wojna 3DV BDL 809 27,303 0 02 Dec 2015
Deep Unsupervised Learning using Nonequilibrium Thermodynamics Jascha Narain Sohl-Dickstein Eric A. Weiss Niru Maheswaranathan Surya Ganguli SyDa DiffM 263 6,887 0 12 Mar 2015
Unsupervised Learning of Video Representations using LSTMs Nitish Srivastava Elman Mansimov Ruslan Salakhutdinov SSL 130 2,589 0 16 Feb 2015
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild K. Soomro Amir Zamir M. Shah CLIP VGen 135 6,145 0 03 Dec 2012