Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.15868
Cited By
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
29 May 2022
Wenyi Hong
Ming Ding
Wendi Zheng
Xinghan Liu
Jie Tang
DiffM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers"
50 / 458 papers shown
Title
LatentMove: Towards Complex Human Movement Video Generation
Ashkan Taghipour
Morteza Ghahremani
Mohammed Bennamoun
F. Boussaïd
Aref Miri Rekavandi
Zinuo Li
Qiuhong Ke
Hamid Laga
3DH
VGen
78
0
0
01 Jul 2025
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
Gaojie Lin
Jianwen Jiang
Jiaqi Yang
Zerong Zheng
Chao Liang
DiffM
VGen
391
29
0
01 Jul 2025
FastInit: Fast Noise Initialization for Temporally Consistent Video Generation
Chengyu Bai
Yuming Li
Zhongyu Zhao
Jintao Chen
Peidong Jia
Qi She
Ming Lu
Shanghang Zhang
DiffM
VGen
16
0
0
19 Jun 2025
DAVID-XR1: Detecting AI-Generated Videos with Explainable Reasoning
Yifeng Gao
Yifan Ding
Hongyu Su
Juncheng Li
Yunhan Zhao
...
Li Wang
Xin Wang
Yixu Wang
Xingjun Ma
Yu-Gang Jiang
VGen
14
0
0
13 Jun 2025
TARDIS STRIDE: A Spatio-Temporal Road Image Dataset and World Model for Autonomy
Héctor Carrión
Yutong Bai
Víctor A. Hernández Castro
Kishan Panaganti
Ayush Zenith
Matthew Trang
Tony Zhang
Pietro Perona
Jitendra Malik
VGen
28
0
0
12 Jun 2025
RoboSwap: A GAN-driven Video Diffusion Framework For Unsupervised Robot Arm Swapping
Yang Bai
Liudi Yang
George Eskandar
Fengyi Shen
Dong Chen
Mohammad Altillawi
Z. Liu
Gitta Kutyniok
VGen
29
0
0
10 Jun 2025
Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding
Boyu Chen
Siran Chen
Kunchang Li
Qinglin Xu
Yu Qiao
Yali Wang
VOS
33
0
0
09 Jun 2025
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Zhengyao Lv
Tianlin Pan
Chenyang Si
Zhaoxi Chen
W. Zuo
Ziwei Liu
Kwan-Yee K. Wong
35
0
0
09 Jun 2025
EgoM2P: Egocentric Multimodal Multitask Pretraining
Gen Li
Yutong Chen
Yiqian Wu
Kaifeng Zhao
Marc Pollefeys
Siyu Tang
EgoV
VLM
44
0
0
09 Jun 2025
TV-LiVE: Training-Free, Text-Guided Video Editing via Layer Informed Vitality Exploitation
M. Kim
Dongjin Kim
Seokju Yun
Jaegul Choo
DiffM
VGen
33
0
0
08 Jun 2025
FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing
Guangzhao Li
Yanming Yang
Chenxi Song
Chi Zhang
DiffM
VGen
112
0
0
05 Jun 2025
Towards Holistic Visual Quality Assessment of AI-Generated Videos: A LLM-Based Multi-Dimensional Evaluation Model
Zelu Qi
Ping Shi
C. Zhang
Shuqi Wang
F. Zhao
Da Pan
Zefeng Ying
EGVM
VGen
146
0
0
05 Jun 2025
LayerFlow: A Unified Model for Layer-aware Video Generation
S. Ji
Hao Luo
Xi Chen
Yuanpeng Tu
Yiyang Wang
Hengshuang Zhao
VGen
OffRL
84
0
0
04 Jun 2025
WorldPrediction: A Benchmark for High-level World Modeling and Long-horizon Procedural Planning
Delong Chen
Willy Chung
Yejin Bang
Ziwei Ji
Pascale Fung
VGen
LM&Ro
76
0
0
04 Jun 2025
Chipmunk: Training-Free Acceleration of Diffusion Transformers with Dynamic Column-Sparse Deltas
Austin Silveria
Soham V. Govande
Daniel Y. Fu
26
0
0
03 Jun 2025
DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation
Zhengyao Lv
Chenyang Si
Tianlin Pan
Zhaoxi Chen
Kwan-Yee K. Wong
Yu Qiao
Ziwei Liu
DiffM
VGen
48
0
0
03 Jun 2025
SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios
Lingwei Dang
Ruizhi Shao
Hongwen Zhang
Wei Min
Yebin Liu
Qingyao Wu
DiffM
VGen
95
0
0
03 Jun 2025
CamCloneMaster: Enabling Reference-based Camera Control for Video Generation
Yawen Luo
J. Bai
Xiaoyu Shi
Menghan Xia
Xintao Wang
Pengfei Wan
Di Zhang
Kun Gai
Tianfan Xue
DiffM
VGen
58
0
0
03 Jun 2025
HaploOmni: Unified Single Transformer for Multimodal Video Understanding and Generation
Yicheng Xiao
Lin Song
Rui Yang
Cheng Cheng
Zunnan Xu
Zhaoyang Zhang
Yixiao Ge
Xiu Li
Ying Shan
60
2
0
03 Jun 2025
OmniV2V: Versatile Video Generation and Editing via Dynamic Content Manipulation
Sen Liang
Zhentao Yu
Zhengguang Zhou
Teng Hu
Hongmei Wang
...
Qin Lin
Yuan Zhou
Xin Li
Qinglin Lu
Zhibo Chen
DiffM
VGen
SyDa
56
0
0
02 Jun 2025
DiffuseSlide: Training-Free High Frame Rate Video Generation Diffusion
Geunmin Hwang
Hyun-kyu Ko
Younghyun Kim
S. W. Lee
Eunbyung Park
VGen
56
0
0
02 Jun 2025
Video Signature: In-generation Watermarking for Latent Video Diffusion Models
Yu Huang
Junhao Chen
Qi Zheng
Hanqian Li
Shuliang Liu
Xuming Hu
DiffM
WIGM
VGen
53
0
0
31 May 2025
Towards a Generalizable Bimanual Foundation Policy via Flow-based Video Prediction
Chenyou Fan
Fangzheng Yan
Chenjia Bai
Jiepeng Wang
Fangqiu Yi
Zhen Wang
Xuelong Li
VGen
507
0
0
30 May 2025
UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation
Yang-tian Sun
Xin Yu
Zehuan Huang
Yi-Hua Huang
Yuan-Chen Guo
Ziyi Yang
Yan-Pei Cao
Xiaojuan Qi
DiffM
VGen
MDE
56
1
0
30 May 2025
A Survey of Generative Categories and Techniques in Multimodal Large Language Models
Longzhen Han
Awes Mubarak
Almas Baimagambetov
Nikolaos Polatidis
Thar Baker
LRM
60
0
0
29 May 2025
MOVi: Training-free Text-conditioned Multi-Object Video Generation
Aimon Rahman
Jiang Liu
Ze Wang
Ximeng Sun
Jialian Wu
Xiaodong Yu
Yusheng Su
Vishal M. Patel
Zicheng Liu
Emad Barsoum
DiffM
VGen
69
0
0
29 May 2025
ATI: Any Trajectory Instruction for Controllable Video Generation
Angtian Wang
Haibin Huang
Jacob Zhiyuan Fang
Yiding Yang
Chongyang Ma
DiffM
VGen
79
0
0
28 May 2025
GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control
Anthony Chen
Wenzhao Zheng
Yida Wang
Xueyang Zhang
Kun Zhan
Peng Jia
Kurt Keutzer
Shanghang Zhang
105
1
0
28 May 2025
Incorporating Flexible Image Conditioning into Text-to-Video Diffusion Models without Training
Bolin Lai
Sangmin Lee
Xu Cao
Xiang Li
James M. Rehg
DiffM
72
0
0
27 May 2025
Sci-Fi: Symmetric Constraint for Frame Inbetweening
Liuhan Chen
Xiaodong Cun
Xiaoyu Li
Xianyi He
Shenghai Yuan
Jie Chen
Ying Shan
Li Yuan
VGen
81
0
0
27 May 2025
MotionPro: A Precise Motion Controller for Image-to-Video Generation
Zhongwei Zhang
Fuchen Long
Zhaofan Qiu
Yingwei Pan
Wu Liu
Ting Yao
Tao Mei
DiffM
VGen
71
1
0
26 May 2025
AniCrafter: Customizing Realistic Human-Centric Animation via Avatar-Background Conditioning in Video Diffusion Models
Muyao Niu
Mingdeng Cao
Yifan Zhan
Qingtian Zhu
Mingze Ma
Jiancheng Zhao
Yanhong Zeng
Zhihang Zhong
Xiao Sun
Yinqiang Zheng
DiffM
VGen
66
0
0
26 May 2025
DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving
Chen Shi
Shaoshuai Shi
Kehua Sheng
Bo Zhang
Li Jiang
VGen
86
0
0
25 May 2025
Model Already Knows the Best Noise: Bayesian Active Noise Selection via Attention in Video Diffusion Model
Kwanyoung Kim
Sanghyun Kim
DiffM
VGen
217
0
0
23 May 2025
Conditional Panoramic Image Generation via Masked Autoregressive Modeling
Chaoyang Wang
Xiangtai Li
Lu Qi
X. Lin
Jinbin Bai
Qianyu Zhou
Yunhai Tong
DiffM
87
1
0
22 May 2025
A Challenge to Build Neuro-Symbolic Video Agents
Sahil Shah
Harsh Goel
Sai Shankar Narasimhan
Minkyu Choi
S P Sharan
Oguzhan Akcin
Sandeep Chinchali
AI4TS
78
0
0
20 May 2025
RoPECraft: Training-Free Motion Transfer with Trajectory-Guided RoPE Optimization on Diffusion Transformers
Ahmet Berke Gokmen
Yigit Ekin
Bahri Batuhan Bilecen
Aysegül Dündar
164
0
0
19 May 2025
Video-GPT via Next Clip Diffusion
Shaobin Zhuang
Zhipeng Huang
Ying Zhang
Fangyikang Wang
Canmiao Fu
Binxin Yang
Chong Sun
Chen Li
Yali Wang
DiffM
VGen
243
0
0
18 May 2025
Generative Pre-trained Autoregressive Diffusion Transformer
Yuan Zhang
Jiacheng Jiang
Guoqing Ma
Zhiying Lu
Haoyang Huang
Jianlong Yuan
Nan Duan
VGen
138
2
0
12 May 2025
ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images
Xianghao Kong
Qiaosong Qi
Yuanbin Wang
Anyi Rao
Biaolong Chen
Aixi Zhang
Si Liu
Hao Jiang
DiffM
VGen
67
1
0
10 May 2025
T2VTextBench: A Human Evaluation Benchmark for Textual Control in Video Generation Models
Xuyang Guo
Jiayan Huo
Zhenmei Shi
Zhao Song
Jiahao Zhang
Jiale Zhao
VGen
503
2
0
08 May 2025
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
Teng Hu
Zhentao Yu
Zhengguang Zhou
Sen Liang
Yuan Zhou
Qin Lin
Qinglin Lu
DiffM
VGen
195
6
0
07 May 2025
DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization
Wenchuan Wang
Mengqi Huang
Yijing Tu
Zhendong Mao
VGen
128
0
0
04 May 2025
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding
Zongxia Li
Xiyang Wu
Guangyao Shi
Yubin Qin
Hongyang Du
Tianyi Zhou
Dinesh Manocha
Jordan Lee Boyd-Graber
MLLM
148
0
0
02 May 2025
T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation
Xuyang Guo
Jiayan Huo
Zhenmei Shi
Zhao Song
Jiahao Zhang
Jiale Zhao
EGVM
VGen
PINN
193
5
0
01 May 2025
Controllable Weather Synthesis and Removal with Video Diffusion Models
Chih-Hao Lin
Ziyi Wang
Ruofan Liang
Yuxuan Zhang
Sanja Fidler
Shenlong Wang
Zan Gojcic
DiffM
VGen
70
1
0
01 May 2025
Direct Motion Models for Assessing Generated Videos
Kelsey R. Allen
Carl Doersch
Guangyao Zhou
Mohammed Suhail
Danny Driess
...
Thomas Kipf
Mehdi S. M. Sajjadi
Kevin P. Murphy
João Carreira
Sjoerd van Steenkiste
EGVM
DiffM
VGen
163
0
0
30 Apr 2025
Simple Visual Artifact Detection in Sora-Generated Videos
Misora Sugiyama
Hirokatsu Kataoka
EGVM
85
0
0
30 Apr 2025
A Survey of Interactive Generative Video
Jiwen Yu
Yiran Qin
Haoxuan Che
Quande Liu
Xinyu Wang
Pengfei Wan
Di Zhang
Kun Gai
Hao Chen
Xihui Liu
VGen
109
3
0
30 Apr 2025
Symbolic Representation for Any-to-Any Generative Tasks
Jianfei Chen
Xiaoye Zhu
Yanjie Wang
Tianyang Liu
Xinhui Chen
...
Yifei Ke
Qingbin Liu
Yiwen Yuan
Julian McAuley
Li Li
DiffM
78
0
0
24 Apr 2025
1
2
3
4
...
8
9
10
Next