Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.02573
Cited By
Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning
4 March 2022
Ligong Han
Jian Ren
Hsin-Ying Lee
Francesco Barbieri
Kyle Olszewski
Shervin Minaee
Dimitris N. Metaxas
Sergey Tulyakov
DiffM
VGen
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning"
50 / 65 papers shown
Title
DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT
Xiaotao Hu
Wei Yin
Mingkai Jia
Junyuan Deng
Xiaoyang Guo
Qian Zhang
Xiaoxiao Long
Ping Tan
VGen
146
14
0
31 Dec 2024
DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models
Xiaoxiao He
Ligong Han
Quan Dao
Song Wen
Minhao Bai
...
Hongdong Li
Junzhou Huang
Faez Ahmed
Akash Srivastava
Dimitris Metaxas
DiffM
SyDa
149
5
0
10 Oct 2024
SINE: SINgle Image Editing with Text-to-Image Diffusion Models
Zhixing Zhang
Ligong Han
Arna Ghosh
Dimitris N. Metaxas
Jian Ren
DiffM
152
160
0
08 Dec 2022
Stochastic Video Prediction with Structure and Motion
Adil Kaan Akan
Sadra Safadoust
Fatma Guney
VGen
70
10
0
20 Mar 2022
Generative Adversarial Networks
Gilad Cohen
Raja Giryes
GAN
298
30,152
0
01 Mar 2022
SLAMP: Stochastic Latent Appearance and Motion Prediction
Adil Kaan Akan
Erkut Erdem
Aykut Erdem
Fatma Guney
51
40
0
05 Aug 2021
Generative Video Transformer: Can Objects be the Words?
Yi-Fu Wu
Jaesik Yoon
Sungjin Ahn
ViT
95
34
0
20 Jul 2021
FitVid: Overfitting in Pixel-Level Video Prediction
Mohammad Babaeizadeh
M. Saffar
Suraj Nair
Sergey Levine
Chelsea Finn
D. Erhan
VLM
107
84
0
24 Jun 2021
Understanding Object Dynamics for Interactive Image-to-Video Synthesis
A. Blattmann
Timo Milbich
Michael Dorkenwald
Bjorn Ommer
DiffM
VGen
71
40
0
21 Jun 2021
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning
Hao Tan
Jie Lei
Thomas Wolf
Joey Tianyi Zhou
102
66
0
21 Jun 2021
Flow Guided Transformable Bottleneck Networks for Motion Retargeting
Jian Ren
Menglei Chai
Oliver J. Woodford
Kyle Olszewski
Sergey Tulyakov
3DH
54
24
0
14 Jun 2021
CogView: Mastering Text-to-Image Generation via Transformers
Ming Ding
Zhuoyi Yang
Wenyi Hong
Wendi Zheng
Chang Zhou
...
Junyang Lin
Xu Zou
Zhou Shao
Hongxia Yang
Jie Tang
ViT
VLM
125
782
0
26 May 2021
Stochastic Image-to-Video Synthesis using cINNs
Michael Dorkenwald
Timo Milbich
A. Blattmann
Robin Rombach
Konstantinos G. Derpanis
Bjorn Ommer
DiffM
VGen
80
55
0
10 May 2021
A Good Image Generator Is What You Need for High-Resolution Video Synthesis
Yu Tian
Jian Ren
Menglei Chai
Kyle Olszewski
Xi Peng
Dimitris N. Metaxas
Sergey Tulyakov
VGen
102
186
0
30 Apr 2021
GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions
Chenfei Wu
Lun Huang
Qianxi Zhang
Binyang Li
Lei Ji
Fan Yang
Guillermo Sapiro
Nan Duan
DiffM
VGen
84
243
0
30 Apr 2021
Motion Representations for Articulated Animation
Aliaksandr Siarohin
Oliver J. Woodford
Jian Ren
Menglei Chai
Sergey Tulyakov
OCL
184
274
0
22 Apr 2021
VideoGPT: Video Generation using VQ-VAE and Transformers
Wilson Yan
Yunzhi Zhang
Pieter Abbeel
A. Srinivas
ViT
VGen
310
512
0
20 Apr 2021
Predicting Video with VQVAE
Jacob Walker
Ali Razavi
Aaron van den Oord
DRL
110
69
0
02 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
999
29,871
0
26 Feb 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
420
5,000
0
24 Feb 2021
Taming Transformers for High-Resolution Image Synthesis
Patrick Esser
Robin Rombach
Bjorn Ommer
ViT
133
3,004
0
17 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
682
41,483
0
22 Oct 2020
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
Jaemin Cho
Jiasen Lu
Dustin Schwenk
Hannaneh Hajishirzi
Aniruddha Kembhavi
VLM
MLLM
89
102
0
23 Sep 2020
TiVGAN: Text to Image to Video Generation with Step-by-Step Evolutionary Generator
Doyeon Kim
Donggyu Joo
Junmo Kim
GAN
65
48
0
04 Sep 2020
Latent Video Transformer
Ruslan Rakhimov
Denis Volkhonskiy
Alexey Artemov
Denis Zorin
Evgeny Burnaev
VGen
100
121
0
18 Jun 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
889
42,463
0
28 May 2020
BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation
Changqian Yu
Changxin Gao
Jingbo Wang
Gang Yu
Chunhua Shen
Nong Sang
SSeg
90
1,218
0
05 Apr 2020
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
Zhicheng Huang
Zhaoyang Zeng
Bei Liu
Dongmei Fu
Jianlong Fu
ViT
161
440
0
02 Apr 2020
First Order Motion Model for Image Animation
Aliaksandr Siarohin
Stéphane Lathuilière
Sergey Tulyakov
Elisa Ricci
N. Sebe
VGen
DiffM
125
936
0
29 Feb 2020
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data
Di Qi
Lin Su
Jianwei Song
Edward Cui
Taroon Bharti
Arun Sacheti
VLM
102
261
0
22 Jan 2020
Lower Dimensional Kernels for Video Discriminators
Emmanuel Kahembwe
S. Ramamoorthy
71
51
0
18 Dec 2019
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
553
42,639
0
03 Dec 2019
Markov Decision Process for Video Generation
V. Yushchenko
Nikita Araslanov
Stefan Roth
VGen
72
20
0
26 Sep 2019
Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis
Wen Liu
Zhixin Piao
Jie Min
Wenhan Luo
Lin Ma
Shenghua Gao
DiffM
64
259
0
26 Sep 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
689
24,557
0
26 Jul 2019
Scaling Autoregressive Video Models
Dirk Weissenborn
Oscar Täckström
Jakob Uszkoreit
DiffM
VGen
94
204
0
06 Jun 2019
Masked Non-Autoregressive Image Captioning
Junlong Gao
Xi Meng
Shiqi Wang
Xia Li
Shanshe Wang
Siwei Ma
Wen Gao
75
37
0
03 Jun 2019
Generating Diverse High-Fidelity Images with VQ-VAE-2
Ali Razavi
Aaron van den Oord
Oriol Vinyals
DRL
BDL
151
1,828
0
02 Jun 2019
A Generalized Framework of Sequence Generation with Application to Undirected Sequence Models
Elman Mansimov
Alex Jinpeng Wang
Sean Welleck
Kyunghyun Cho
AIMat
56
46
0
29 May 2019
Video Generation from Single Semantic Label Map
Junting Pan
Chengyu Wang
Xu Jia
Jing Shao
Lu Sheng
Junjie Yan
Xiaogang Wang
VGen
46
104
0
11 Mar 2019
BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model
Alex Jinpeng Wang
Kyunghyun Cho
VLM
103
358
0
11 Feb 2019
Towards Accurate Generative Models of Video: A New Metric & Challenges
Thomas Unterthiner
Sjoerd van Steenkiste
Karol Kurach
Raphaël Marinier
Marcin Michalski
Sylvain Gelly
EGVM
VGen
93
747
0
03 Dec 2018
Train Sparsely, Generate Densely: Memory-efficient Unsupervised Training of High-resolution Temporal GAN
Masaki Saito
Shunta Saito
Masanori Koyama
Sosuke Kobayashi
94
147
0
22 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
95,229
0
11 Oct 2018
Towards High Resolution Video Generation with Progressive Growing of Sliced Wasserstein GANs
Dinesh Acharya
Zhiwu Huang
D. Paudel
Luc Van Gool
GAN
53
68
0
04 Oct 2018
Learning to Decompose and Disentangle Representations for Video Prediction
Jun-Ting Hsieh
Bingbin Liu
De-An Huang
Li Fei-Fei
Juan Carlos Niebles
DRL
186
306
0
11 Jun 2018
Assessing Generative Models via Precision and Recall
Mehdi S. M. Sajjadi
Olivier Bachem
Mario Lucic
Olivier Bousquet
Sylvain Gelly
EGVM
95
584
0
31 May 2018
To Create What You Tell: Generating Videos from Captions
Yingwei Pan
Zhaofan Qiu
Ting Yao
Houqiang Li
Tao Mei
GAN
81
154
0
23 Apr 2018
Stochastic Video Generation with a Learned Prior
Emily L. Denton
Rob Fergus
VGen
106
526
0
21 Feb 2018
Non-Autoregressive Neural Machine Translation
Jiatao Gu
James Bradbury
Caiming Xiong
Victor O.K. Li
R. Socher
107
798
0
07 Nov 2017
1
2
Next