Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.12590
Cited By
v1
v2 (latest)
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
22 August 2024
Can Qin
Congying Xia
Krithika Ramakrishnan
Michael S Ryoo
Lifu Tu
Yihao Feng
Manli Shu
Honglu Zhou
Anas Awadalla
Jun Wang
Senthil Purushwalkam
Le Xue
Yingbo Zhou
Huan Wang
Silvio Savarese
Juan Carlos Niebles
Zeyuan Chen
Ran Xu
Caiming Xiong
VGen
DiffM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations"
26 / 26 papers shown
Title
Open-Sora: Democratizing Efficient Video Production for All
Zangwei Zheng
Xiangyu Peng
Tianji Yang
Chenhui Shen
Shenggui Li
Hongxin Liu
Yukun Zhou
Tianyi Li
Yang You
VGen
158
256
0
31 Dec 2024
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Le Xue
Manli Shu
Anas Awadalla
Jun Wang
An Yan
...
Zeyuan Chen
Silvio Savarese
Juan Carlos Niebles
Caiming Xiong
Ran Xu
VLM
96
96
0
16 Aug 2024
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
Kepan Nan
Rui Xie
Penghao Zhou
Tiehan Fan
Zhenheng Yang
Zhijie Chen
Xiang Li
Jian Yang
Ying Tai
135
93
0
02 Jul 2024
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Ruohong Zhang
Liangke Gui
Zhiqing Sun
Yihao Feng
Keyang Xu
...
Di Fu
Chunyuan Li
Alexander G. Hauptmann
Yonatan Bisk
Yiming Yang
MLLM
107
78
0
01 Apr 2024
Mixtral of Experts
Albert Q. Jiang
Alexandre Sablayrolles
Antoine Roux
A. Mensch
Blanche Savary
...
Théophile Gervet
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoE
LLMAG
155
1,121
0
08 Jan 2024
Latte: Latent Diffusion Transformer for Video Generation
Xin Ma
Yaohui Wang
Gengyun Jia
Xinyuan Chen
Ziqiang Liu
Yuan-Fang Li
Cunjian Chen
Yu Qiao
DiffM
VGen
255
278
0
05 Jan 2024
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
A. Piergiovanni
Isaac Noble
Dahun Kim
Michael S. Ryoo
Victor Gomes
A. Angelova
90
21
0
09 Nov 2023
Scalable Diffusion Models with Transformers
William S. Peebles
Saining Xie
GNN
118
2,434
0
19 Dec 2022
MAGVIT: Masked Generative Video Transformer
Lijun Yu
Yong Cheng
Kihyuk Sohn
José Lezama
Han Zhang
...
Alexander G. Hauptmann
Ming-Hsuan Yang
Yuan Hao
Irfan Essa
Lu Jiang
DiffM
VGen
77
248
0
10 Dec 2022
Imagen Video: High Definition Video Generation with Diffusion Models
Jonathan Ho
William Chan
Chitwan Saharia
Jay Whang
Ruiqi Gao
...
Diederik P. Kingma
Ben Poole
Mohammad Norouzi
David J. Fleet
Tim Salimans
VGen
171
1,542
0
05 Oct 2022
Make-A-Video: Text-to-Video Generation without Text-Video Data
Uriel Singer
Adam Polyak
Thomas Hayes
Xiaoyue Yin
Jie An
...
Oron Ashual
Oran Gafni
Devi Parikh
Sonal Gupta
Yaniv Taigman
DiffM
VGen
83
1,428
0
29 Sep 2022
Classifier-Free Diffusion Guidance
Jonathan Ho
Tim Salimans
FaML
196
3,963
0
26 Jul 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
418
3,607
0
29 Apr 2022
Video Diffusion Models
Jonathan Ho
Tim Salimans
Alexey A. Gritsenko
William Chan
Mohammad Norouzi
David J. Fleet
DiffM
VGen
209
1,640
0
07 Apr 2022
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
496
15,768
0
20 Dec 2021
VideoGPT: Video Generation using VQ-VAE and Transformers
Wilson Yan
Yunzhi Zhang
Pieter Abbeel
A. Srinivas
ViT
VGen
310
512
0
20 Apr 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
315
2,521
0
20 Apr 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
163
1,189
0
01 Apr 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
676
41,483
0
22 Oct 2020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
488
20,342
0
23 Oct 2019
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Xin Eric Wang
Jiawei Wu
Junkun Chen
Lei Li
Yuan-fang Wang
William Yang Wang
101
555
0
06 Apr 2019
Neural Discrete Representation Learning
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
BDL
SSL
OCL
230
5,079
0
02 Nov 2017
Localizing Moments in Video with Natural Language
Lisa Anne Hendricks
Oliver Wang
Eli Shechtman
Josef Sivic
Trevor Darrell
Bryan C. Russell
125
949
0
04 Aug 2017
Towards Automatic Learning of Procedures from Web Instructional Videos
Luowei Zhou
Chenliang Xu
Jason J. Corso
EgoV
75
831
0
28 Mar 2017
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger
Philipp Fischer
Thomas Brox
SSeg
3DV
1.9K
77,441
0
18 May 2015
A Dataset for Movie Description
Anna Rohrbach
Marcus Rohrbach
Niket Tandon
Bernt Schiele
VGen
124
502
0
12 Jan 2015
1