ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.02573
  4. Cited By
Show Me What and Tell Me How: Video Synthesis via Multimodal
  Conditioning

Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning

4 March 2022
Ligong Han
Jian Ren
Hsin-Ying Lee
Francesco Barbieri
Kyle Olszewski
Shervin Minaee
Dimitris N. Metaxas
Sergey Tulyakov
    DiffMVGen
ArXiv (abs)PDFHTML

Papers citing "Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning"

50 / 65 papers shown
Title
DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT
DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT
Xiaotao Hu
Wei Yin
Mingkai Jia
Junyuan Deng
Xiaoyang Guo
Qian Zhang
Xiaoxiao Long
Ping Tan
VGen
146
14
0
31 Dec 2024
DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models
DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models
Xiaoxiao He
Ligong Han
Quan Dao
Song Wen
Minhao Bai
...
Hongdong Li
Junzhou Huang
Faez Ahmed
Akash Srivastava
Dimitris Metaxas
DiffMSyDa
149
5
0
10 Oct 2024
SINE: SINgle Image Editing with Text-to-Image Diffusion Models
SINE: SINgle Image Editing with Text-to-Image Diffusion Models
Zhixing Zhang
Ligong Han
Arna Ghosh
Dimitris N. Metaxas
Jian Ren
DiffM
152
160
0
08 Dec 2022
Stochastic Video Prediction with Structure and Motion
Stochastic Video Prediction with Structure and Motion
Adil Kaan Akan
Sadra Safadoust
Fatma Guney
VGen
70
10
0
20 Mar 2022
Generative Adversarial Networks
Generative Adversarial Networks
Gilad Cohen
Raja Giryes
GAN
298
30,152
0
01 Mar 2022
SLAMP: Stochastic Latent Appearance and Motion Prediction
SLAMP: Stochastic Latent Appearance and Motion Prediction
Adil Kaan Akan
Erkut Erdem
Aykut Erdem
Fatma Guney
51
40
0
05 Aug 2021
Generative Video Transformer: Can Objects be the Words?
Generative Video Transformer: Can Objects be the Words?
Yi-Fu Wu
Jaesik Yoon
Sungjin Ahn
ViT
95
34
0
20 Jul 2021
FitVid: Overfitting in Pixel-Level Video Prediction
FitVid: Overfitting in Pixel-Level Video Prediction
Mohammad Babaeizadeh
M. Saffar
Suraj Nair
Sergey Levine
Chelsea Finn
D. Erhan
VLM
107
84
0
24 Jun 2021
Understanding Object Dynamics for Interactive Image-to-Video Synthesis
Understanding Object Dynamics for Interactive Image-to-Video Synthesis
A. Blattmann
Timo Milbich
Michael Dorkenwald
Bjorn Ommer
DiffMVGen
71
40
0
21 Jun 2021
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive
  Learning
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning
Hao Tan
Jie Lei
Thomas Wolf
Joey Tianyi Zhou
102
66
0
21 Jun 2021
Flow Guided Transformable Bottleneck Networks for Motion Retargeting
Flow Guided Transformable Bottleneck Networks for Motion Retargeting
Jian Ren
Menglei Chai
Oliver J. Woodford
Kyle Olszewski
Sergey Tulyakov
3DH
54
24
0
14 Jun 2021
CogView: Mastering Text-to-Image Generation via Transformers
CogView: Mastering Text-to-Image Generation via Transformers
Ming Ding
Zhuoyi Yang
Wenyi Hong
Wendi Zheng
Chang Zhou
...
Junyang Lin
Xu Zou
Zhou Shao
Hongxia Yang
Jie Tang
ViTVLM
125
782
0
26 May 2021
Stochastic Image-to-Video Synthesis using cINNs
Stochastic Image-to-Video Synthesis using cINNs
Michael Dorkenwald
Timo Milbich
A. Blattmann
Robin Rombach
Konstantinos G. Derpanis
Bjorn Ommer
DiffMVGen
80
55
0
10 May 2021
A Good Image Generator Is What You Need for High-Resolution Video
  Synthesis
A Good Image Generator Is What You Need for High-Resolution Video Synthesis
Yu Tian
Jian Ren
Menglei Chai
Kyle Olszewski
Xi Peng
Dimitris N. Metaxas
Sergey Tulyakov
VGen
102
186
0
30 Apr 2021
GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions
GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions
Chenfei Wu
Lun Huang
Qianxi Zhang
Binyang Li
Lei Ji
Fan Yang
Guillermo Sapiro
Nan Duan
DiffMVGen
84
243
0
30 Apr 2021
Motion Representations for Articulated Animation
Motion Representations for Articulated Animation
Aliaksandr Siarohin
Oliver J. Woodford
Jian Ren
Menglei Chai
Sergey Tulyakov
OCL
184
274
0
22 Apr 2021
VideoGPT: Video Generation using VQ-VAE and Transformers
VideoGPT: Video Generation using VQ-VAE and Transformers
Wilson Yan
Yunzhi Zhang
Pieter Abbeel
A. Srinivas
ViTVGen
310
512
0
20 Apr 2021
Predicting Video with VQVAE
Predicting Video with VQVAE
Jacob Walker
Ali Razavi
Aaron van den Oord
DRL
110
69
0
02 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIPVLM
999
29,871
0
26 Feb 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
420
5,000
0
24 Feb 2021
Taming Transformers for High-Resolution Image Synthesis
Taming Transformers for High-Resolution Image Synthesis
Patrick Esser
Robin Rombach
Bjorn Ommer
ViT
133
3,004
0
17 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
682
41,483
0
22 Oct 2020
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal
  Transformers
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
Jaemin Cho
Jiasen Lu
Dustin Schwenk
Hannaneh Hajishirzi
Aniruddha Kembhavi
VLMMLLM
89
102
0
23 Sep 2020
TiVGAN: Text to Image to Video Generation with Step-by-Step Evolutionary
  Generator
TiVGAN: Text to Image to Video Generation with Step-by-Step Evolutionary Generator
Doyeon Kim
Donggyu Joo
Junmo Kim
GAN
65
48
0
04 Sep 2020
Latent Video Transformer
Latent Video Transformer
Ruslan Rakhimov
Denis Volkhonskiy
Alexey Artemov
Denis Zorin
Evgeny Burnaev
VGen
100
121
0
18 Jun 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
889
42,463
0
28 May 2020
BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time
  Semantic Segmentation
BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation
Changqian Yu
Changxin Gao
Jingbo Wang
Gang Yu
Chunhua Shen
Nong Sang
SSeg
90
1,218
0
05 Apr 2020
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal
  Transformers
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
Zhicheng Huang
Zhaoyang Zeng
Bei Liu
Dongmei Fu
Jianlong Fu
ViT
161
440
0
02 Apr 2020
First Order Motion Model for Image Animation
First Order Motion Model for Image Animation
Aliaksandr Siarohin
Stéphane Lathuilière
Sergey Tulyakov
Elisa Ricci
N. Sebe
VGenDiffM
125
936
0
29 Feb 2020
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised
  Image-Text Data
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data
Di Qi
Lin Su
Jianwei Song
Edward Cui
Taroon Bharti
Arun Sacheti
VLM
102
261
0
22 Jan 2020
Lower Dimensional Kernels for Video Discriminators
Lower Dimensional Kernels for Video Discriminators
Emmanuel Kahembwe
S. Ramamoorthy
71
51
0
18 Dec 2019
PyTorch: An Imperative Style, High-Performance Deep Learning Library
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
553
42,639
0
03 Dec 2019
Markov Decision Process for Video Generation
Markov Decision Process for Video Generation
V. Yushchenko
Nikita Araslanov
Stefan Roth
VGen
72
20
0
26 Sep 2019
Liquid Warping GAN: A Unified Framework for Human Motion Imitation,
  Appearance Transfer and Novel View Synthesis
Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis
Wen Liu
Zhixin Piao
Jie Min
Wenhan Luo
Lin Ma
Shenghua Gao
DiffM
64
259
0
26 Sep 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
689
24,557
0
26 Jul 2019
Scaling Autoregressive Video Models
Scaling Autoregressive Video Models
Dirk Weissenborn
Oscar Täckström
Jakob Uszkoreit
DiffMVGen
94
204
0
06 Jun 2019
Masked Non-Autoregressive Image Captioning
Masked Non-Autoregressive Image Captioning
Junlong Gao
Xi Meng
Shiqi Wang
Xia Li
Shanshe Wang
Siwei Ma
Wen Gao
75
37
0
03 Jun 2019
Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2
Ali Razavi
Aaron van den Oord
Oriol Vinyals
DRLBDL
151
1,828
0
02 Jun 2019
A Generalized Framework of Sequence Generation with Application to
  Undirected Sequence Models
A Generalized Framework of Sequence Generation with Application to Undirected Sequence Models
Elman Mansimov
Alex Jinpeng Wang
Sean Welleck
Kyunghyun Cho
AIMat
56
46
0
29 May 2019
Video Generation from Single Semantic Label Map
Video Generation from Single Semantic Label Map
Junting Pan
Chengyu Wang
Xu Jia
Jing Shao
Lu Sheng
Junjie Yan
Xiaogang Wang
VGen
46
104
0
11 Mar 2019
BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field
  Language Model
BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model
Alex Jinpeng Wang
Kyunghyun Cho
VLM
103
358
0
11 Feb 2019
Towards Accurate Generative Models of Video: A New Metric & Challenges
Towards Accurate Generative Models of Video: A New Metric & Challenges
Thomas Unterthiner
Sjoerd van Steenkiste
Karol Kurach
Raphaël Marinier
Marcin Michalski
Sylvain Gelly
EGVMVGen
93
747
0
03 Dec 2018
Train Sparsely, Generate Densely: Memory-efficient Unsupervised Training
  of High-resolution Temporal GAN
Train Sparsely, Generate Densely: Memory-efficient Unsupervised Training of High-resolution Temporal GAN
Masaki Saito
Shunta Saito
Masanori Koyama
Sosuke Kobayashi
94
147
0
22 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
1.8K
95,229
0
11 Oct 2018
Towards High Resolution Video Generation with Progressive Growing of
  Sliced Wasserstein GANs
Towards High Resolution Video Generation with Progressive Growing of Sliced Wasserstein GANs
Dinesh Acharya
Zhiwu Huang
D. Paudel
Luc Van Gool
GAN
53
68
0
04 Oct 2018
Learning to Decompose and Disentangle Representations for Video
  Prediction
Learning to Decompose and Disentangle Representations for Video Prediction
Jun-Ting Hsieh
Bingbin Liu
De-An Huang
Li Fei-Fei
Juan Carlos Niebles
DRL
186
306
0
11 Jun 2018
Assessing Generative Models via Precision and Recall
Assessing Generative Models via Precision and Recall
Mehdi S. M. Sajjadi
Olivier Bachem
Mario Lucic
Olivier Bousquet
Sylvain Gelly
EGVM
95
584
0
31 May 2018
To Create What You Tell: Generating Videos from Captions
To Create What You Tell: Generating Videos from Captions
Yingwei Pan
Zhaofan Qiu
Ting Yao
Houqiang Li
Tao Mei
GAN
81
154
0
23 Apr 2018
Stochastic Video Generation with a Learned Prior
Stochastic Video Generation with a Learned Prior
Emily L. Denton
Rob Fergus
VGen
106
526
0
21 Feb 2018
Non-Autoregressive Neural Machine Translation
Non-Autoregressive Neural Machine Translation
Jiatao Gu
James Bradbury
Caiming Xiong
Victor O.K. Li
R. Socher
107
798
0
07 Nov 2017
12
Next