Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.15127
Cited By
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
25 November 2023
A. Blattmann
Tim Dockhorn
Sumith Kulal
Daniel Mendelevitch
Maciej Kilian
Dominik Lorenz
Yam Levi
Zion English
Vikram S. Voleti
Adam Letts
Varun Jampani
Robin Rombach
VGen
Re-assign community
ArXiv (abs)
PDF
HTML
Github (25943★)
Papers citing
"Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets"
50 / 332 papers shown
Title
DepthART: Monocular Depth Estimation as Autoregressive Refinement Task
Bulat Gabdullin
Nina Konovalova
Nikolay Patakin
Dmitry Senushkin
Anton Konushin
MDE
73
1
0
01 Jul 2025
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
Gaojie Lin
Jianwen Jiang
Jiaqi Yang
Zerong Zheng
Chao Liang
DiffM
VGen
383
29
0
01 Jul 2025
Edit360: 2D Image Edits to 3D Assets from Any Angle
Junchao Huang
Xinting Hu
Zhuotao Tian
Shaoshuai Shi
Li Jiang
VGen
118
0
0
01 Jul 2025
LatentMove: Towards Complex Human Movement Video Generation
Ashkan Taghipour
Morteza Ghahremani
Mohammed Bennamoun
F. Boussaïd
Aref Miri Rekavandi
Zinuo Li
Qiuhong Ke
Hamid Laga
3DH
VGen
74
0
0
01 Jul 2025
Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition
Jiaqi Li
Junshu Tang
Zhiyong Xu
Longhuang Wu
Yuan Zhou
Shuai Shao
Tianbao Yu
Zhiguo Cao
Qinglin Lu
DiffM
VGen
12
0
0
20 Jun 2025
Emergent Temporal Correspondences from Video Diffusion Transformers
Jisu Nam
Soowon Son
Dahyun Chung
Jiyoung Kim
Siyoon Jin
Junhwa Hur
Seungryong Kim
VGen
23
0
0
20 Jun 2025
FastInit: Fast Noise Initialization for Temporally Consistent Video Generation
Chengyu Bai
Yuming Li
Zhongyu Zhao
Jintao Chen
Peidong Jia
Qi She
Ming Lu
Shanghang Zhang
DiffM
VGen
16
0
0
19 Jun 2025
Show-o2: Improved Native Unified Multimodal Models
Jinheng Xie
Zhenheng Yang
Mike Zheng Shou
VGen
39
0
0
18 Jun 2025
One-Step Diffusion for Detail-Rich and Temporally Consistent Video Super-Resolution
Yujing Sun
Lingchen Sun
Shuaizheng Liu
Rongyuan Wu
Zhengqiang Zhang
Lei Zhang
DiffM
VGen
52
0
0
18 Jun 2025
UniRelight: Learning Joint Decomposition and Synthesis for Video Relighting
Kai He
Ruofan Liang
Jacob Munkberg
J. Hasselgren
Nandita Vijaykumar
Alexander Keller
Sanja Fidler
Igor Gilitschenski
Zan Gojcic
Zian Wang
36
0
0
18 Jun 2025
EchoShot: Multi-Shot Portrait Video Generation
Jiahao Wang
Hualian Sheng
Sijia Cai
Weizhan Zhang
Caixia Yan
Yachuang Feng
Bing Deng
Jieping Ye
DiffM
VGen
11
0
0
16 Jun 2025
STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation
Jiamin Wang
Yichen Yao
Xiang Feng
Hang Wu
Yaming Wang
Qingqiu Huang
Y. Ma
Xinge Zhu
VGen
29
0
0
16 Jun 2025
Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis
Yuan Gao
Mattia Piccinini
Yuchen Zhang
Dingrui Wang
Korbinian Moller
...
Steven Peters
Andrea Stocco
Bassam Alrifaee
Marco Pavone
Johannes Betz
21
0
0
13 Jun 2025
DAVID-XR1: Detecting AI-Generated Videos with Explainable Reasoning
Yifeng Gao
Yifan Ding
Hongyu Su
Juncheng Li
Yunhan Zhao
...
Li Wang
Xin Wang
Yixu Wang
Xingjun Ma
Yu-Gang Jiang
VGen
7
0
0
13 Jun 2025
DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers
Lizhen Wang
Zhurong Xia
T. Hu
P. Wang
Pengfei Wang
Zerong Zheng
Ming Zhou
DiffM
VGen
115
0
0
12 Jun 2025
GenWorld: Towards Detecting AI-generated Real-world Simulation Videos
Weiliang Chen
Wenzhao Zheng
Yu Zheng
Lei Chen
Jie Zhou
Jiwen Lu
Yueqi Duan
VGen
116
0
0
12 Jun 2025
Fine-Grained Perturbation Guidance via Attention Head Selection
Donghoon Ahn
Jiwon Kang
Sanghyun Lee
Minjae Kim
Jaewon Min
Wooseok Jang
Saungwu Lee
Sayak Paul
S. Hong
Seungryong Kim
DiffM
AAML
123
0
0
12 Jun 2025
AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation
Haoyuan Shi
Yunxin Li
Xinyu Chen
Longyue Wang
Baotian Hu
Min Zhang
DiffM
VGen
101
0
0
12 Jun 2025
SPARKE: Scalable Prompt-Aware Diversity Guidance in Diffusion Models via RKE Score
Mohammad Jalali
Haoyu Lei
Amin Gohari
Farzan Farnia
DiffM
64
0
0
11 Jun 2025
Text-Aware Image Restoration with Diffusion Models
Jaewon Min
J. Kim
Paul Hyunbin Cho
J. Lee
Jihye Park
Minkyu Park
S. Kim
Hyunhee Park
Seungryong Kim
51
0
0
11 Jun 2025
From Pixels to Graphs: using Scene and Knowledge Graphs for HD-EPIC VQA Challenge
Agnese Taluzzi
Davide Gesualdi
Riccardo Santambrogio
Chiara Plizzari
Francesca Palermo
S. Mentasti
Matteo Matteucci
GNN
49
2
0
10 Jun 2025
HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation
Ziyao Huang
Zixiang Zhou
Juan Cao
Yifeng Ma
Yi Chen
...
Hongmei Wang
Qin Lin
Yuan Zhou
Qinglin Lu
Fan Tang
VGen
35
0
0
10 Jun 2025
Context-aware TFL: A Universal Context-aware Contrastive Learning Framework for Temporal Forgery Localization
Qilin Yin
Wei Lu
Xiangyang Luo
Xiaochun Cao
21
0
0
10 Jun 2025
MagCache: Fast Video Generation with Magnitude-Aware Cache
Zehong Ma
Longhui Wei
Feng Wang
Shiliang Zhang
Q. Tian
40
0
0
10 Jun 2025
ProSplat: Improved Feed-Forward 3D Gaussian Splatting for Wide-Baseline Sparse Views
Xiaohan Lu
Jiaye Fu
Jiaqi Zhang
Zetian Song
Chuanmin Jia
Siwei Ma
3DGS
15
0
0
09 Jun 2025
EgoM2P: Egocentric Multimodal Multitask Pretraining
Gen Li
Yutong Chen
Yiqian Wu
Kaifeng Zhao
Marc Pollefeys
Siyu Tang
EgoV
VLM
38
0
0
09 Jun 2025
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
Teng Hu
Zhentao Yu
Zhengguang Zhou
Jiangning Zhang
Yuan Zhou
Qinglin Lu
Ran Yi
VGen
20
0
0
09 Jun 2025
NOVA3D: Normal Aligned Video Diffusion Model for Single Image to 3D Generation
Yuxiao Yang
Peihao Li
Yuhong Zhang
Junzhe Lu
Xianglong He
Minghan Qin
Weitao Wang
Haoqian Wang
DiffM
VGen
17
0
0
09 Jun 2025
Consistent Video Editing as Flow-Driven Image-to-Video Generation
Ge Wang
Songlin Fan
Hangxu Liu
Quanjian Song
Hewei Wang
Jinfeng Xu
DiffM
VGen
29
0
0
09 Jun 2025
Audio-Sync Video Generation with Multi-Stream Temporal Control
Shuchen Weng
Haojie Zheng
Zheng Chang
Si Li
Boxin Shi
Xinlong Wang
DiffM
VGen
28
0
0
09 Jun 2025
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models
Sangwon Jang
Taekyung Ki
Jaehyeong Jo
Jaehong Yoon
Soo Ye Kim
Zhe Lin
Sung Ju Hwang
DiffM
VGen
25
0
0
08 Jun 2025
Self-Adapting Improvement Loops for Robotic Learning
Calvin Luo
Zilai Zeng
Mingxi Jia
Yilun Du
Chen Sun
24
0
0
07 Jun 2025
Identity Deepfake Threats to Biometric Authentication Systems: Public and Expert Perspectives
Shijing He
Yaxiong Lei
Zihan Zhang
Yuzhou Sun
S. Li
Chi Zhang
Juan Ye
20
0
0
07 Jun 2025
FADE: Frequency-Aware Diffusion Model Factorization for Video Editing
Yixuan Zhu
Haolin Wang
Shilin Ma
Wenliang Zhao
Yansong Tang
Lei Chen
Jie Zhou
DiffM
VGen
47
0
0
06 Jun 2025
Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision
Yuping He
Yifei Huang
Guo Chen
Lidong Lu
Baoqi Pei
Jilan Xu
Tong Lu
Yoichi Sato
EgoV
77
0
0
06 Jun 2025
Noise Consistency Regularization for Improved Subject-Driven Image Synthesis
Yao Ni
Song Wen
Piotr Koniusz
A. Cherian
17
0
0
06 Jun 2025
Restereo: Diffusion stereo video generation and restoration
Xingchang Huang
Ashish Kumar Singh
Florian Dubost
C. N. Vasconcelos
Sakar Khattar
Liang Shi
Christian Theobalt
Cengiz Öztireli
Gurprit Singh
DiffM
VGen
77
0
0
06 Jun 2025
LLIA -- Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models
Haojie Yu
Zhaonian Wang
Yihan Pan
Meng Cheng
Hao Yang
Chao Wang
Tao Xie
Xiaoming Xu
Xiaoming Wei
Xunliang Cai
VGen
58
0
0
06 Jun 2025
FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation
Huihan Wang
Zhiwen Yang
Hui Zhang
Dan Zhao
Bingzheng Wei
Yan Xu
MedIm
ViT
96
0
0
05 Jun 2025
EX-4D: EXtreme Viewpoint 4D Video Synthesis via Depth Watertight Mesh
Tao Hu
Haoyang Peng
Xiao Liu
Yuewen Ma
VGen
MDE
53
0
0
05 Jun 2025
Controllable Human-centric Keyframe Interpolation with Generative Prior
Z. Guo
Size Wu
Zhongang Cai
Wei Li
Chen Change Loy
DiffM
VGen
50
0
0
03 Jun 2025
Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences
Yunhong Lu
Qichao Wang
H. Cao
Xiaoyin Xu
Min Zhang
47
0
0
03 Jun 2025
SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios
Lingwei Dang
Ruizhi Shao
Hongwen Zhang
Wei Min
Yebin Liu
Qingyao Wu
DiffM
VGen
82
0
0
03 Jun 2025
DiffuseSlide: Training-Free High Frame Rate Video Generation Diffusion
Geunmin Hwang
Hyun-kyu Ko
Younghyun Kim
S. W. Lee
Eunbyung Park
VGen
50
0
0
02 Jun 2025
Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks
Tao Yang
Ruibin Li
Yangming Shi
Yuqi Zhang
Qide Dong
Haoran Cheng
Weiguo Feng
Shilei Wen
Bingyue Peng
Lei Zhang
DiffM
VGen
62
0
0
02 Jun 2025
Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning
Yijun Yang
Zhao-Yang Wang
Qiuping Liu
Shuwen Sun
Kang Wang
...
Zongwei Zhou
Alan Yuille
Lei Zhu
Yu Zhang
Jieneng Chen
26
0
0
02 Jun 2025
OmniV2V: Versatile Video Generation and Editing via Dynamic Content Manipulation
Sen Liang
Zhentao Yu
Zhengguang Zhou
Teng Hu
Hongmei Wang
...
Qin Lin
Yuan Zhou
Xin Li
Qinglin Lu
Zhibo Chen
DiffM
VGen
SyDa
53
0
0
02 Jun 2025
WorldExplorer: Towards Generating Fully Navigable 3D Scenes
Manuel-Andreas Schneider
Lukas Höllein
Matthias Nießner
VGen
51
0
0
02 Jun 2025
SkyReels-Audio: Omni Audio-Conditioned Talking Portraits in Video Diffusion Transformers
Zhengcong Fei
Hao Jiang
Di Qiu
Baoxuan Gu
Youqiang Zhang
...
Jialin Bai
Debang Li
Mingyuan Fan
Guibin Chen
Yahui Zhou
DiffM
VGen
57
0
0
01 Jun 2025
Temporal In-Context Fine-Tuning for Versatile Control of Video Diffusion Models
Kinam Kim
J. Hyung
Jaegul Choo
DiffM
VGen
37
0
0
01 Jun 2025
1
2
3
4
5
6
7
Next