ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.03048
  4. Cited By
Latte: Latent Diffusion Transformer for Video Generation

Latte: Latent Diffusion Transformer for Video Generation

5 January 2024
Xin Ma
Yaohui Wang
Gengyun Jia
Xinyuan Chen
Ziqiang Liu
Yuan-Fang Li
Cunjian Chen
Yu Qiao
    DiffM
    VGen
ArXivPDFHTML

Papers citing "Latte: Latent Diffusion Transformer for Video Generation"

50 / 271 papers shown
Title
Faster Image2Video Generation: A Closer Look at CLIP Image Embedding's
  Impact on Spatio-Temporal Cross-Attentions
Faster Image2Video Generation: A Closer Look at CLIP Image Embedding's Impact on Spatio-Temporal Cross-Attentions
Ashkan Taghipour
Morteza Ghahremani
Bennamoun
Aref Miri Rekavandi
Zinuo Li
Hamid Laga
F. Boussaïd
VGen
89
3
0
27 Jul 2024
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
Sherwin Bahmani
Ivan Skorokhodov
Aliaksandr Siarohin
Willi Menapace
Guocheng Qian
...
Chaoyang Wang
Jiaxu Zou
Andrea Tagliasacchi
David B. Lindell
Sergey Tulyakov
VGen
DiffM
167
46
0
17 Jul 2024
Scaling Diffusion Transformers to 16 Billion Parameters
Scaling Diffusion Transformers to 16 Billion Parameters
Zhengcong Fei
Mingyuan Fan
Changqian Yu
Debang Li
Junshi Huang
DiffM
MoE
85
20
0
16 Jul 2024
A Comprehensive Survey on Human Video Generation: Challenges, Methods,
  and Insights
A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights
Wentao Lei
Jinting Wang
Fengji Ma
Guanjie Huang
Li Liu
VGen
EGVM
78
8
0
11 Jul 2024
MiraData: A Large-Scale Video Dataset with Long Durations and Structured
  Captions
MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions
Xuan Ju
Yiming Gao
Zhaoyang Zhang
Ziyang Yuan
Xintao Wang
Ailing Zeng
Yu Xiong
Qiang Xu
Ying Shan
VGen
99
43
0
08 Jul 2024
VIMI: Grounding Video Generation through Multi-modal Instruction
VIMI: Grounding Video Generation through Multi-modal Instruction
Yuwei Fang
Willi Menapace
Aliaksandr Siarohin
Tsai-Shien Chen
Kuan-Chien Wang
Ivan Skorokhodov
Graham Neubig
Sergey Tulyakov
VGen
118
2
0
08 Jul 2024
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
Kepan Nan
Rui Xie
Penghao Zhou
Tiehan Fan
Zhenheng Yang
Zhijie Chen
Xiang Li
Jian Yang
Ying Tai
130
84
0
02 Jul 2024
MimicMotion: High-Quality Human Motion Video Generation with
  Confidence-aware Pose Guidance
MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
Yuang Zhang
Jiaxi Gu
Li-Wen Wang
Han Wang
Junqi Cheng
Yuefeng Zhu
Fangyuan Zou
VGen
103
78
0
28 Jun 2024
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of
  Text-to-Time-lapse Video Generation
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation
Shenghai Yuan
Jinfa Huang
Yongqi Xu
Yaoyang Liu
Shaofeng Zhang
Yujun Shi
Ruijie Zhu
Xinhua Cheng
Jiebo Luo
Li Yuan
EGVM
VGen
100
36
0
26 Jun 2024
Diffusion Model-Based Video Editing: A Survey
Diffusion Model-Based Video Editing: A Survey
Wenhao Sun
Rong-Cheng Tu
Jingyi Liao
Dacheng Tao
VGen
74
23
0
26 Jun 2024
MotionBooth: Motion-Aware Customized Text-to-Video Generation
MotionBooth: Motion-Aware Customized Text-to-Video Generation
Jianzong Wu
Xiangtai Li
Yanhong Zeng
Jiangning Zhang
Qianyu Zhou
Yining Li
Yunhai Tong
Kai Chen
DiffM
VGen
109
45
0
25 Jun 2024
MVOC: a training-free multiple video object composition method with
  diffusion models
MVOC: a training-free multiple video object composition method with diffusion models
Wei Wang
Yaosen Chen
Yuegen Liu
Qi Yuan
Shubin Yang
Yanru Zhang
DiffM
85
2
0
22 Jun 2024
Identifying and Solving Conditional Image Leakage in Image-to-Video
  Diffusion Model
Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model
Min Zhao
Hongzhou Zhu
Chendong Xiang
Kaiwen Zheng
Chongxuan Li
Jun Zhu
85
10
0
22 Jun 2024
IRASim: Learning Interactive Real-Robot Action Simulators
IRASim: Learning Interactive Real-Robot Action Simulators
Fangqi Zhu
Hongtao Wu
Song Guo
Yuxiao Liu
Chilam Cheang
Tao Kong
94
21
0
20 Jun 2024
Neural Residual Diffusion Models for Deep Scalable Vision Generation
Neural Residual Diffusion Models for Deep Scalable Vision Generation
Zhiyuan Ma
Liangliang Zhao
Biqing Qi
Bowen Zhou
DiffM
94
3
0
19 Jun 2024
ViD-GPT: Introducing GPT-style Autoregressive Generation in Video
  Diffusion Models
ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models
Kaifeng Gao
Jiaxin Shi
Hanwang Zhang
Chunping Wang
Jun Xiao
DiffM
VGen
108
14
0
16 Jun 2024
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Junke Wang
Yi Jiang
Zehuan Yuan
Binyue Peng
Zuxuan Wu
Yu-Gang Jiang
ViT
VGen
104
42
0
13 Jun 2024
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing
  Reliability,Reproducibility, and Practicality
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality
Tianle Zhang
Langtian Ma
Yuchen Yan
Yuchen Zhang
Kai Wang
...
Wenqi Shao
Yang You
Yu Qiao
Ping Luo
Kaipeng Zhang
VGen
103
2
0
13 Jun 2024
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and
  Video Generation
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation
Kai Wang
Shijian Deng
Jing Shi
Dimitrios Hatzinakos
Yapeng Tian
VGen
88
11
0
11 Jun 2024
Compositional Video Generation as Flow Equalization
Compositional Video Generation as Flow Equalization
Xingyi Yang
Xinchao Wang
DiffM
VGen
91
14
0
10 Jun 2024
Vript: A Video Is Worth Thousands of Words
Vript: A Video Is Worth Thousands of Words
Dongjie Yang
Suyuan Huang
Chengqiang Lu
Xiaodong Han
Haoxin Zhang
Yan Gao
Yao Hu
Hai Zhao
VGen
101
29
0
10 Jun 2024
ShareGPT4Video: Improving Video Understanding and Generation with Better
  Captions
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Lin Chen
Xilin Wei
Jinsong Li
Xiaoyi Dong
Pan Zhang
...
Li Yuan
Yu Qiao
Dahua Lin
Feng Zhao
Jiaqi Wang
105
167
0
06 Jun 2024
VideoTetris: Towards Compositional Text-to-Video Generation
VideoTetris: Towards Compositional Text-to-Video Generation
Ye Tian
Ling Yang
Haotian Yang
Yuan Gao
Yufan Deng
...
Zhaochen Yu
Xin Tao
Pengfei Wan
Di Zhang
Bin Cui
DiffM
VGen
110
18
0
06 Jun 2024
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion
Hao Wen
Zehuan Huang
Yaohui Wang
Xinyuan Chen
Yu Qiao
138
9
0
05 Jun 2024
CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation
CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation
Dejia Xu
Weili Nie
Chao Liu
Sifei Liu
Jan Kautz
Zhangyang Wang
Arash Vahdat
DiffM
VGen
115
55
0
04 Jun 2024
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
Tianchen Zhao
Tongcheng Fang
Haofeng Huang
Enshu Liu
Widyadewi Soedarmadji
...
Shengen Yan
Huazhong Yang
Xuefei Ning
Xuefei Ning
Yu Wang
MQ
VGen
164
33
0
04 Jun 2024
CV-VAE: A Compatible Video VAE for Latent Generative Video Models
CV-VAE: A Compatible Video VAE for Latent Generative Video Models
Sijie Zhao
Yong Zhang
Xiaodong Cun
Shaoshu Yang
Muyao Niu
Xiaoyu Li
Wenbo Hu
Ying Shan
DiffM
94
26
0
30 May 2024
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo
  Benchmark
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark
Haoxing Chen
Yan Hong
Zizheng Huang
Zhuoer Xu
Zhangxuan Gu
...
Jun Lan
Huijia Zhu
Jianfu Zhang
Weiqiang Wang
Huaxiong Li
Mamba
117
18
0
30 May 2024
DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention
DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention
Lianghui Zhu
Zilong Huang
Bencheng Liao
Jun Hao Liew
Hanshu Yan
Jiashi Feng
Xinggang Wang
94
15
0
28 May 2024
Human4DiT: Free-view Human Video Generation with 4D Diffusion
  Transformer
Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer
Ruizhi Shao
Youxin Pang
Zerong Zheng
Jingxiang Sun
Yebin Liu
VGen
88
16
0
27 May 2024
A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
Kai Wang
Yukun Zhou
Mingjia Shi
Zhihang Yuan
Yuzhang Shang
Yuzhang Shang
Hanwang Zhang
Hanwang Zhang
Yang You
104
14
0
27 May 2024
Scaling Diffusion Mamba with Bidirectional SSMs for Efficient Image and
  Video Generation
Scaling Diffusion Mamba with Bidirectional SSMs for Efficient Image and Video Generation
Shentong Mo
Yapeng Tian
Mamba
83
17
0
24 May 2024
PipeFusion: Displaced Patch Pipeline Parallelism for Inference of
  Diffusion Transformer Models
PipeFusion: Displaced Patch Pipeline Parallelism for Inference of Diffusion Transformer Models
Jiannan Wang
Jiarui Fang
Aoyu Li
PengCheng Yang
AI4CE
84
8
0
23 May 2024
FIFO-Diffusion: Generating Infinite Videos from Text without Training
FIFO-Diffusion: Generating Infinite Videos from Text without Training
Jihwan Kim
Junoh Kang
Jinyoung Choi
Bohyung Han
DiffM
VGen
89
31
0
19 May 2024
From Sora What We Can See: A Survey of Text-to-Video Generation
From Sora What We Can See: A Survey of Text-to-Video Generation
Rui Sun
Yumin Zhang
Tejal Shah
Jiahao Sun
Shuoying Zhang
Wenqi Li
Haoran Duan
Bo Wei
R. Ranjan
EGVM
103
20
0
17 May 2024
Is Sora a World Simulator? A Comprehensive Survey on General World
  Models and Beyond
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
Zheng Zhu
Xiaofeng Wang
Wangbo Zhao
Chen Min
Nianchen Deng
...
Dawei Zhao
Liang Xiao
Jian-jun Zhao
Jiwen Lu
Guan Huang
VGen
LM&Ro
122
44
0
06 May 2024
Video Diffusion Models: A Survey
Video Diffusion Models: A Survey
Andrew Melnik
Michal Ljubljanac
Cong Lu
Qi Yan
Weiming Ren
Helge J. Ritter
VGen
103
14
0
06 May 2024
Matten: Video Generation with Mamba-Attention
Matten: Video Generation with Mamba-Attention
Yu Gao
Jiancheng Huang
Xiaopeng Sun
Zequn Jie
Yujie Zhong
Lin Ma
120
13
0
05 May 2024
Beyond Deepfake Images: Detecting AI-Generated Videos
Beyond Deepfake Images: Detecting AI-Generated Videos
Danial Samadi Vahdati
Tai D. Nguyen
Aref Azizpour
Matthew C. Stamm
104
13
0
24 Apr 2024
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse
  Controls to Any Diffusion Model
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
Han Lin
Jaemin Cho
Abhaysinh Zala
Mohit Bansal
DiffM
VGen
87
26
0
15 Apr 2024
LoopAnimate: Loopable Salient Object Animation
LoopAnimate: Loopable Salient Object Animation
Fanyi Wang
Peng Liu
Haotian Hu
Dan Meng
Jingwen Su
Jinjin Xu
Yanhao Zhang
Xiaoming Ren
Zhiwang Zhang
VGen
54
2
0
14 Apr 2024
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
Shenghai Yuan
Jinfa Huang
Yujun Shi
Yongqi Xu
Ruijie Zhu
Bin Lin
Xinhua Cheng
Li-xin Yuan
Jiebo Luo
VGen
133
35
0
07 Apr 2024
TC4D: Trajectory-Conditioned Text-to-4D Generation
TC4D: Trajectory-Conditioned Text-to-4D Generation
Sherwin Bahmani
Xian Liu
Yifan Wang
Ivan Skorokhodov
Victor Rong
...
Jeong Joon Park
Sergey Tulyakov
Gordon Wetzstein
Andrea Tagliasacchi
David B. Lindell
123
37
0
26 Mar 2024
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework
Zhengqing Yuan
Ruoxi Chen
Zhaoxu Li
Haolong Jia
Lifang He
Chi Wang
Lichao Sun
VGen
79
27
0
20 Mar 2024
Endora: Video Generation Models as Endoscopy Simulators
Endora: Video Generation Models as Endoscopy Simulators
Chenxin Li
Hengyu Liu
Yifan Liu
Brandon Yushan Feng
Wuyang Li
Xinyu Liu
Zhen Chen
Jing Shao
Yixuan Yuan
VGen
MedIm
95
39
0
17 Mar 2024
EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with
  Audio2Video Diffusion Model under Weak Conditions
EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Linrui Tian
Qi Wang
Bang Zhang
Liefeng Bo
DiffM
98
113
0
27 Feb 2024
Sora: A Review on Background, Technology, Limitations, and Opportunities
  of Large Vision Models
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Yixin Liu
Kai Zhang
Yuan Li
Zhiling Yan
Chujie Gao
...
Yue Huang
Hanchi Sun
Jianfeng Gao
Lifang He
Lichao Sun
VLM
VGen
EGVM
101
291
0
27 Feb 2024
GeReA: Question-Aware Prompt Captions for Knowledge-based Visual
  Question Answering
GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering
Ziyu Ma
Shutao Li
Bin Sun
Jianfei Cai
Zuxiang Long
Fuyan Ma
52
2
0
04 Feb 2024
VideoPoet: A Large Language Model for Zero-Shot Video Generation
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Dan Kondratyuk
Lijun Yu
Xiuye Gu
José Lezama
Jonathan Huang
...
Irfan Essa
Huisheng Wang
David A. Ross
Bryan Seybold
Lu Jiang
VGen
83
259
0
21 Dec 2023
Photorealistic Video Generation with Diffusion Models
Photorealistic Video Generation with Diffusion Models
Agrim Gupta
Lijun Yu
Kihyuk Sohn
Xiuye Gu
Meera Hahn
Fei-Fei Li
Irfan Essa
Lu Jiang
José Lezama
VGen
83
193
0
11 Dec 2023
Previous
123456
Next