ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.03048
  4. Cited By
Latte: Latent Diffusion Transformer for Video Generation

Latte: Latent Diffusion Transformer for Video Generation

5 January 2024
Xin Ma
Yaohui Wang
Gengyun Jia
Xinyuan Chen
Ziqiang Liu
Yuan-Fang Li
Cunjian Chen
Yu Qiao
    DiffM
    VGen
ArXivPDFHTML

Papers citing "Latte: Latent Diffusion Transformer for Video Generation"

50 / 271 papers shown
Title
Omegance: A Single Parameter for Various Granularities in
  Diffusion-Based Synthesis
Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis
Xinyu Hou
Zongsheng Yue
Xiaoming Li
Chen Change Loy
VGen
DiffM
121
0
0
26 Nov 2024
VideoDirector: Precise Video Editing via Text-to-Video Models
VideoDirector: Precise Video Editing via Text-to-Video Models
Yukun Wang
Longguang Wang
Zhiyuan Ma
Qibin Hu
Kai Xu
Yulan Guo
VGen
DiffM
165
0
0
26 Nov 2024
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
Zongjian Li
Bin Lin
Yang Ye
Liuhan Chen
Xinhua Cheng
Shenghai Yuan
Li-xin Yuan
VGen
DiffM
146
19
0
26 Nov 2024
Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric
Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric
Zhichao Zhang
Wei Sun
Xinyue Li
Yunhao Li
Qihang Ge
...
Zhongpeng Ji
Fengyu Sun
Shangling Jui
Xiongkuo Min
Guangtao Zhai
EGVM
204
1
0
25 Nov 2024
Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing
Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing
Kaifeng Gao
Jiaxin Shi
Hanwang Zhang
Chunping Wang
Jun Xiao
Long Chen
VGen
DiffM
158
2
0
25 Nov 2024
LaVin-DiT: Large Vision Diffusion Transformer
Zhaoqing Wang
Xiaobo Xia
Runnan Chen
Dongdong Yu
Changhu Wang
Mingming Gong
Tongliang Liu
141
7
0
18 Nov 2024
StableV2V: Stablizing Shape Consistency in Video-to-Video Editing
Chang-Shu Liu
Rui Li
Kaidong Zhang
Yunwei Lan
Dong Liu
DiffM
VGen
65
6
0
17 Nov 2024
SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers
SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers
Joseph Liu
Joshua Geddes
Ziyu Guo
Haomiao Jiang
Mahesh Kumar Nandwana
91
1
0
15 Nov 2024
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video
  Generation
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation
Xiaofeng Wang
Kang Zhao
Fan Liu
Jiayu Wang
Guosheng Zhao
Xiaoyi Bao
Zheng Hua Zhu
Yingya Zhang
Xingang Wang
VGen
83
8
0
13 Nov 2024
Latent Space Disentanglement in Diffusion Transformers Enables Precise
  Zero-shot Semantic Editing
Latent Space Disentanglement in Diffusion Transformers Enables Precise Zero-shot Semantic Editing
Zitao Shuai
Chenwei Wu
Zhengxu Tang
Bowen Song
Liyue Shen
DiffM
76
0
0
12 Nov 2024
Artificial Intelligence for Biomedical Video Generation
Artificial Intelligence for Biomedical Video Generation
Linyuan Li
Jianing Qiu
Anujit Saha
Lin Li
Poyuan Li
Mengxian He
Ziyu Guo
Wu Yuan
VGen
121
1
0
12 Nov 2024
Improved Video VAE for Latent Video Diffusion Model
Improved Video VAE for Latent Video Diffusion Model
Pingyu Wu
Kai Zhu
Yu Liu
Liming Zhao
Wei-dong Zhai
Yang Cao
Zheng-jun Zha
VGen
DiffM
68
3
0
10 Nov 2024
I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength
I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength
Wanquan Feng
Jiawei Liu
Pengqi Tu
Tianhao Qi
Mingzhen Sun
Tianxiang Ma
Mingcong Liu
Siyu Zhou
Qian He
VGen
127
8
0
10 Nov 2024
ReCapture: Generative Video Camera Controls for User-Provided Videos
  using Masked Video Fine-Tuning
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
David Junhao Zhang
Roni Paiss
Shiran Zada
Nikhil Karnad
David E. Jacobs
Yael Pritch
Inbar Mosseri
Mike Zheng Shou
Neal Wadhwa
Nataniel Ruiz
DiffM
VGen
129
18
0
07 Nov 2024
xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive
  Parallelism
xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
Jiarui Fang
Jinzhe Pan
Xibo Sun
Aoyu Li
Jiannan Wang
96
7
0
04 Nov 2024
Optical Flow Representation Alignment Mamba Diffusion Model for Medical
  Video Generation
Optical Flow Representation Alignment Mamba Diffusion Model for Medical Video Generation
Zhenbin Wang
Lei Zhang
Lituan Wang
Minjuan Zhu
Zhenwei Zhang
VGen
MedIm
76
3
0
03 Nov 2024
GameGen-X: Interactive Open-world Game Video Generation
GameGen-X: Interactive Open-world Game Video Generation
Haoxuan Che
Xuanhua He
Quande Liu
Cheng Jin
Hao Chen
VGen
100
22
0
01 Nov 2024
ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
Zongyi Li
Shujie Hu
Shujie Liu
Long Zhou
Jeongsoo Choi
Lingwei Meng
Xun Guo
Jiajian Li
H. Ling
Furu Wei
VGen
DiffM
122
5
0
27 Oct 2024
Are Visual-Language Models Effective in Action Recognition? A
  Comparative Study
Are Visual-Language Models Effective in Action Recognition? A Comparative Study
Mahmoud Ali
Di Yang
François Brémond
VLM
78
0
0
22 Oct 2024
Allegro: Open the Black Box of Commercial-Level Video Generation Model
Allegro: Open the Black Box of Commercial-Level Video Generation Model
Yuan Zhou
Qiuyue Wang
Yuxuan Cai
Huan Yang
VGen
VLM
130
33
0
20 Oct 2024
FrameBridge: Improving Image-to-Video Generation with Bridge Models
FrameBridge: Improving Image-to-Video Generation with Bridge Models
Yuji Wang
Zehua Chen
Xiaoyu Chen
Jun-Jie Zhu
Jianfei Chen
DiffM
VGen
403
5
0
20 Oct 2024
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise
  Motion Control
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control
Yujie Wei
Shiwei Zhang
Hangjie Yuan
Xiang Wang
Haonan Qiu
...
Fan Liu
Zhizhong Huang
Jiaxin Ye
Yingya Zhang
Hongming Shan
DiffM
VGen
95
18
0
17 Oct 2024
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving
  Scene Representation
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation
Guosheng Zhao
Chaojun Ni
Xiaofeng Wang
Zheng Zhu
Xinming Zhang
...
Xinze Chen
Boyuan Wang
Youyi Zhang
Wenjun Mei
Xingang Wang
VGen
110
30
0
17 Oct 2024
Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning
Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning
Aakanksha
Arash Ahmadian
Seraphina Goldfarb-Tarrant
Beyza Ermis
Marzieh Fadaee
Sara Hooker
MoMe
83
12
0
14 Oct 2024
FasterDiT: Towards Faster Diffusion Transformers Training without
  Architecture Modification
FasterDiT: Towards Faster Diffusion Transformers Training without Architecture Modification
J. Yao
Wang Cheng
Wenyu Liu
Xinggang Wang
68
11
0
14 Oct 2024
Progressive Autoregressive Video Diffusion Models
Progressive Autoregressive Video Diffusion Models
Desai Xie
Zhan Xu
Yicong Hong
Hao Tan
Difan Liu
Feng Liu
Arie E. Kaufman
Yang Zhou
DiffM
VGen
88
11
0
10 Oct 2024
Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content
Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content
Qiuheng Wang
Yukai Shi
Jiarong Ou
Ruoxin Chen
Ke Lin
...
Mingwu Zheng
Xin Tao
Fei Yang
Pengfei Wan
Di Zhang
VGen
121
26
0
10 Oct 2024
EvolveDirector: Approaching Advanced Text-to-Image Generation with Large
  Vision-Language Models
EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models
Rui Zhao
Hangjie Yuan
Yujie Wei
Shiwei Zhang
Yuchao Gu
...
Xiang Wang
Zhangjie Wu
Junhao Zhang
Yingya Zhang
Mike Zheng Shou
DiffM
VLM
80
4
0
09 Oct 2024
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Sihyun Yu
Sangkyung Kwak
Huiwon Jang
Jongheon Jeong
Jonathan Huang
Jinwoo Shin
Saining Xie
OCL
138
85
0
09 Oct 2024
GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge
  for Robot Manipulation
GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation
Chi-Lam Cheang
Guangzeng Chen
Ya Jing
Tao Kong
Hang Li
...
Hongtao Wu
Jiafeng Xu
Yichu Yang
Hanbo Zhang
Minzhao Zhu
VGen
LM&Ro
97
69
0
08 Oct 2024
Pyramidal Flow Matching for Efficient Video Generative Modeling
Pyramidal Flow Matching for Efficient Video Generative Modeling
Yang Jin
Zhicheng Sun
Ningyuan Li
Kun Xu
K. Xu
...
Nan Zhuang
Quzhe Huang
Yang Song
Yadong Mu
Zhouchen Lin
VGen
119
78
0
08 Oct 2024
GS-VTON: Controllable 3D Virtual Try-on with Gaussian Splatting
GS-VTON: Controllable 3D Virtual Try-on with Gaussian Splatting
Yukang Cao
Masoud Hadi
Liang Pan
Ziwei Liu
3DGS
DiffM
68
4
0
07 Oct 2024
IV-Mixed Sampler: Leveraging Image Diffusion Models for Enhanced Video
  Synthesis
IV-Mixed Sampler: Leveraging Image Diffusion Models for Enhanced Video Synthesis
Shitong Shao
Zikai Zhou
Lichen Bai
Haoyi Xiong
Zeke Xie
VGen
67
1
0
05 Oct 2024
VEDIT: Latent Prediction Architecture For Procedural Video
  Representation Learning
VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
Han Lin
Tushar Nagarajan
Nicolas Ballas
Mido Assran
Mojtaba Komeili
Joey Tianyi Zhou
Koustuv Sinha
AI4TS
82
4
0
04 Oct 2024
Dynamic Diffusion Transformer
Dynamic Diffusion Transformer
Wangbo Zhao
Yizeng Han
Jiasheng Tang
Kai Wang
Yibing Song
Gao Huang
Fan Wang
Yang You
97
13
0
04 Oct 2024
Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep
  Approach
Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach
Yaofang Liu
Y. Ren
Xiaodong Cun
Aitor Artola
Yang Liu
Tieyong Zeng
Raymond H. Chan
Jean-Michel Morel
VGen
DiffM
87
2
0
04 Oct 2024
Denoising with a Joint-Embedding Predictive Architecture
Denoising with a Joint-Embedding Predictive Architecture
Dengsheng Chen
Jie Hu
Xiaoming Wei
Enhua Wu
DiffM
108
2
0
02 Oct 2024
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Hong Chen
Xin Wang
Yuwei Zhou
Bin Huang
Yipeng Zhang
Wei Feng
Houlun Chen
Zeyang Zhang
Siao Tang
Wenwu Zhu
DiffM
105
9
0
23 Sep 2024
FaceVid-1K: A Large-Scale High-Quality Multiracial Human Face Video
  Dataset
FaceVid-1K: A Large-Scale High-Quality Multiracial Human Face Video Dataset
Donglin Di
Hao Feng
Wenzhang Sun
Yongjia Ma
Hao Li
Wei Chen
Xiaofei Gou
Tonghua Su
Xun Yang
CVBM
94
2
0
23 Sep 2024
LVCD: Reference-based Lineart Video Colorization with Diffusion Models
LVCD: Reference-based Lineart Video Colorization with Diffusion Models
Zhitong Huang
Mohan Zhang
Jing Liao
DiffM
VGen
70
11
0
19 Sep 2024
AMG: Avatar Motion Guided Video Generation
AMG: Avatar Motion Guided Video Generation
Zhangsihao Yang
Mengyi Shan
Mohammad Farazi
Wenhui Zhu
Yanxi Chen
Xuanzhao Dong
Yalin Wang
VGen
DiffM
89
0
0
02 Sep 2024
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video
  Diffusion Model
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
Liuhan Chen
Zongjian Li
Bin Lin
Bin Zhu
Qian Wang
Shenghai Yuan
X. Zhou
Xinhua Cheng
Li Yuan
DiffM
124
14
0
02 Sep 2024
FLUX that Plays Music
FLUX that Plays Music
Zhengcong Fei
Mingyuan Fan
Changqian Yu
Junshi Huang
109
8
0
01 Sep 2024
Training-free Long Video Generation with Chain of Diffusion Model
  Experts
Training-free Long Video Generation with Chain of Diffusion Model Experts
Wenhao Li
Yichao Cao
Xiu Su
Xi Lin
Shan You
Mingkai Zheng
Yi Chen
Chang Xu
VGen
DiffM
66
0
0
24 Aug 2024
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed
  Representations
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Can Qin
Congying Xia
Krithika Ramakrishnan
Michael S Ryoo
Lifu Tu
...
Silvio Savarese
Juan Carlos Niebles
Zeyuan Chen
Ran Xu
Caiming Xiong
VGen
DiffM
106
3
0
22 Aug 2024
Real-Time Video Generation with Pyramid Attention Broadcast
Real-Time Video Generation with Pyramid Attention Broadcast
Xuanlei Zhao
Xiaolong Jin
Kai Wang
Yang You
VGen
DiffM
114
40
0
22 Aug 2024
Factorized-Dreamer: Training A High-Quality Video Generator with Limited
  and Low-Quality Data
Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data
Tao Yang
Yangming Shi
Yunwen Huang
Feng Chen
Yin Zheng
Lei Zhang
DiffM
VGen
70
0
0
19 Aug 2024
Efficient Diffusion Transformer with Step-wise Dynamic Attention
  Mediators
Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
Yifan Pu
Zhuofan Xia
Jiayi Guo
Dongchen Han
Qixiu Li
...
Ji Li
Yizeng Han
Shiji Song
Gao Huang
Xiu Li
93
12
0
11 Aug 2024
MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture
  Generation
MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation
Xiaofeng Mao
Zhengkai Jiang
Qilin Wang
Chencan Fu
Jiangning Zhang
Jiafu Wu
Yabiao Wang
Chengjie Wang
Wei Li
Mingmin Chi
95
4
0
06 Aug 2024
VidGen-1M: A Large-Scale Dataset for Text-to-video Generation
VidGen-1M: A Large-Scale Dataset for Text-to-video Generation
Zhiyu Tan
Xiaomeng Yang
Luozheng Qin
Hao Li
VGen
69
21
0
05 Aug 2024
Previous
123456
Next