Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.10674
Cited By
From Sora What We Can See: A Survey of Text-to-Video Generation
17 May 2024
Rui Sun
Yumin Zhang
Tejal Shah
Jiahao Sun
Shuoying Zhang
Wenqi Li
Haoran Duan
Bo Wei
R. Ranjan
EGVM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"From Sora What We Can See: A Survey of Text-to-Video Generation"
46 / 46 papers shown
Title
VACT: A Video Automatic Causal Testing System and a Benchmark
Haotong Yang
Qingyuan Zheng
Yunjian Gao
Yongkun Yang
Yangbo He
Zhouchen Lin
Muhan Zhang
VGen
CML
80
0
0
08 Mar 2025
LLM Multi-Agent Systems: Challenges and Open Problems
Shanshan Han
Qifan Zhang
Yuhang Yao
Weizhao Jin
Zhaozhuo Xu
LLMAG
70
39
0
05 Feb 2024
UniVG: Towards UNIfied-modal Video Generation
Ludan Ruan
Lei Tian
Chuanwei Huang
Xu Zhang
Xinyan Xiao
VGen
DiffM
52
3
0
17 Jan 2024
Latte: Latent Diffusion Transformer for Video Generation
Xin Ma
Yaohui Wang
Gengyun Jia
Xinyuan Chen
Ziqiang Liu
Yuan-Fang Li
Cunjian Chen
Yu Qiao
DiffM
VGen
168
252
0
05 Jan 2024
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
Jiaxi Lv
Yi Huang
Mingfu Yan
Jiancheng Huang
Jianzhuang Liu
Yifan Liu
Yafei Wen
Xiaoxin Chen
Shifeng Chen
VGen
DiffM
55
23
0
21 Nov 2023
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
Xinyuan Chen
Yaohui Wang
Lingjun Zhang
Shaobin Zhuang
Xin Ma
Jiashuo Yu
Yali Wang
Dahua Lin
Yu Qiao
Ziwei Liu
VGen
DiffM
39
137
0
31 Oct 2023
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
David Junhao Zhang
Jay Zhangjie Wu
Jia-Wei Liu
Rui Zhao
L. Ran
Yuchao Gu
Difei Gao
Mike Zheng Shou
DiffM
VGen
66
217
0
27 Sep 2023
StyleInV: A Temporal Style Modulated Inversion Network for Unconditional Video Generation
Yuhan Wang
Liming Jiang
Chen Change Loy
VGen
61
15
0
31 Aug 2023
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks
Haiyang Xu
Qinghao Ye
Xuan-Wei Wu
Mingshi Yan
Yuan Miao
...
Qingfang Qian
Maofei Que
Ji Zhang
Xiaoyan Zeng
Feiyan Huang
VLM
MLLM
77
23
0
07 Jun 2023
Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents
Yashar Talebirad
Amirhossein Nadiri
LLMAG
85
209
0
05 Jun 2023
Detector Guidance for Multi-Object Text-to-Image Generation
Luping Liu
Zijian Zhang
Yi Ren
Rongjie Huang
Xiang Yin
Zhou Zhao
DiffM
52
10
0
04 Jun 2023
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Sihan Chen
Handong Li
Qunbo Wang
Zijia Zhao
Ming-Ting Sun
Xinxin Zhu
Qingbin Liu
107
101
0
29 May 2023
Towards Building the Federated GPT: Federated Instruction Tuning
Jianyi Zhang
Saeed Vahidian
Martin Kuo
Chunyuan Li
Ruiyi Zhang
Tong Yu
Yufan Zhou
Guoyin Wang
Yiran Chen
ALM
FedML
60
122
0
09 May 2023
Adding Conditional Control to Text-to-Image Diffusion Models
Lvmin Zhang
Anyi Rao
Maneesh Agrawala
AI4CE
77
4,015
1
10 Feb 2023
Phenaki: Variable Length Video Generation From Open Domain Textual Description
Ruben Villegas
Mohammad Babaeizadeh
Pieter-Jan Kindermans
Hernan Moraldo
Han Zhang
M. Saffar
Santiago Castro
Julius Kunze
D. Erhan
DiffM
VGen
101
381
0
05 Oct 2022
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model
Mingyuan Zhang
Zhongang Cai
Liang Pan
Fangzhou Hong
Xinying Guo
Lei Yang
Ziwei Liu
DiffM
VGen
77
558
0
31 Aug 2022
NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis
Chenfei Wu
Jian Liang
Xiaowei Hu
Zhe Gan
Jianfeng Wang
Lijuan Wang
Zicheng Liu
Yuejian Fang
Nan Duan
VGen
61
72
0
20 Jul 2022
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
Wenyi Hong
Ming Ding
Wendi Zheng
Xinghan Liu
Jie Tang
DiffM
290
585
0
29 May 2022
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
Songwei Ge
Thomas Hayes
Harry Yang
Xiaoyue Yin
Guan Pang
David Jacobs
Jia-Bin Huang
Devi Parikh
ViT
84
217
0
07 Apr 2022
Video Diffusion Models
Jonathan Ho
Tim Salimans
Alexey A. Gritsenko
William Chan
Mohammad Norouzi
David J. Fleet
DiffM
VGen
145
1,563
0
07 Apr 2022
Generative Adversarial Networks
Gilad Cohen
Raja Giryes
GAN
177
30,069
0
01 Mar 2022
Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks
Sihyun Yu
Jihoon Tack
Sangwoo Mo
Hyunsu Kim
Junho Kim
Jung-Woo Ha
Jinwoo Shin
DiffM
VGen
82
200
0
21 Feb 2022
StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2
Ivan Skorokhodov
Sergey Tulyakov
Mohamed Elhoseiny
VGen
71
285
0
29 Dec 2021
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Chenfei Wu
Jian Liang
Lei Ji
Fan Yang
Yuejian Fang
Daxin Jiang
Nan Duan
ViT
VGen
52
294
0
24 Nov 2021
Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Hongwei Xue
Tiankai Hang
Yanhong Zeng
Yuchong Sun
Bei Liu
Huan Yang
Jianlong Fu
B. Guo
AI4TS
VLM
56
191
0
19 Nov 2021
VideoGPT: Video Generation using VQ-VAE and Transformers
Wilson Yan
Yunzhi Zhang
Pieter Abbeel
A. Srinivas
ViT
VGen
285
495
0
20 Apr 2021
NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video
Jiaming Sun
Yiming Xie
Linghao Chen
Xiaowei Zhou
Hujun Bao
3DV
51
291
0
01 Apr 2021
Implicit Neural Representations with Periodic Activation Functions
Vincent Sitzmann
Julien N. P. Martel
Alexander W. Bergman
David B. Lindell
Gordon Wetzstein
AI4TS
105
2,516
0
17 Jun 2020
Softmax Splatting for Video Frame Interpolation
Simon Niklaus
Feng Liu
125
387
0
11 Mar 2020
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
91
1,192
0
07 Jun 2019
Train Sparsely, Generate Densely: Memory-efficient Unsupervised Training of High-resolution Temporal GAN
Masaki Saito
Shunta Saito
Masanori Koyama
Sosuke Kobayashi
49
146
0
22 Nov 2018
How2: A Large-scale Dataset for Multimodal Language Understanding
Ramon Sanabria
Ozan Caglayan
Shruti Palaskar
Desmond Elliott
Loïc Barrault
Lucia Specia
Florian Metze
VGen
MLLM
61
287
0
01 Nov 2018
Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos
Gunnar Sigurdsson
Abhinav Gupta
Cordelia Schmid
Ali Farhadi
Alahari Karteek
SLR
EgoV
44
163
0
25 Apr 2018
To Create What You Tell: Generating Videos from Captions
Yingwei Pan
Zhaofan Qiu
Ting Yao
Houqiang Li
Tao Mei
GAN
52
153
0
23 Apr 2018
Imagine This! Scripts to Compositions to Videos
Tanmay Gupta
Dustin Schwenk
Ali Farhadi
Derek Hoiem
Aniruddha Kembhavi
CoGe
VGen
132
89
0
10 Apr 2018
Neural Discrete Representation Learning
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
BDL
SSL
OCL
166
4,928
0
02 Nov 2017
Generative Adversarial Networks: An Overview
Antonia Creswell
Tom White
Vincent Dumoulin
Kai Arulkumaran
B. Sengupta
Anil A Bharath
GAN
90
3,005
0
19 Oct 2017
Localizing Moments in Video with Natural Language
Lisa Anne Hendricks
Oliver Wang
Eli Shechtman
Josef Sivic
Trevor Darrell
Bryan C. Russell
93
940
0
04 Aug 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
206
7,961
0
22 May 2017
Towards Automatic Learning of Procedures from Web Instructional Videos
Luowei Zhou
Chenliang Xu
Jason J. Corso
EgoV
59
819
0
28 Mar 2017
Sync-DRAW: Automatic Video Generation using Deep Recurrent Attentive Architectures
Gaurav Mittal
Tanya Marwah
V. Balasubramanian
VGen
DiffM
57
67
0
30 Nov 2016
Generating Videos with Scene Dynamics
Carl Vondrick
Hamed Pirsiavash
Antonio Torralba
GAN
VGen
156
1,465
0
08 Sep 2016
Improved Techniques for Training GANs
Tim Salimans
Ian Goodfellow
Wojciech Zaremba
Vicki Cheung
Alec Radford
Xi Chen
GAN
383
8,999
0
10 Jun 2016
Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding
Gunnar Sigurdsson
Gül Varol
Xinyu Wang
Ali Farhadi
Ivan Laptev
Abhinav Gupta
VGen
77
1,238
0
06 Apr 2016
Communication-Efficient Learning of Deep Networks from Decentralized Data
H. B. McMahan
Eider Moore
Daniel Ramage
S. Hampson
Blaise Agüera y Arcas
FedML
234
17,328
0
17 Feb 2016
Rethinking the Inception Architecture for Computer Vision
Christian Szegedy
Vincent Vanhoucke
Sergey Ioffe
Jonathon Shlens
Z. Wojna
3DV
BDL
497
27,231
0
02 Dec 2015
1