ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.15868
  4. Cited By
CogVideo: Large-scale Pretraining for Text-to-Video Generation via
  Transformers

CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers

29 May 2022
Wenyi Hong
Ming Ding
Wendi Zheng
Xinghan Liu
Jie Tang
    DiffM
ArXiv (abs)PDFHTML

Papers citing "CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers"

50 / 458 papers shown
Title
SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for
  Short Drama
SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama
Jing Tang
Quanlu Jia
Yuqiang Xie
Zeyu Gong
Xiang Wen
Jiayi Zhang
Yalong Guo
Guibin Chen
Jiangping Yang
VGen
73
1
0
18 Aug 2024
Quality Assessment in the Era of Large Models: A Survey
Quality Assessment in the Era of Large Models: A Survey
Zicheng Zhang
Yingjie Zhou
Chunyi Li
Baixuan Zhao
Xiaohong Liu
Guangtao Zhai
103
12
0
17 Aug 2024
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Zhuoyi Yang
Jiayan Teng
Wendi Zheng
Ming Ding
Shiyu Huang
...
Weihan Wang
Yean Cheng
Xiaotao Gu
Yuxiao Dong
Jie Tang
DiffMVGen
314
565
0
12 Aug 2024
Survey: Transformer-based Models in Data Modality Conversion
Survey: Transformer-based Models in Data Modality Conversion
Elyas Rashno
Amir Eskandari
Aman Anand
F. Zulkernine
MedIm
93
0
0
08 Aug 2024
UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks
  With Large Language Model
UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model
Zhaowei Li
Wei Wang
Yiqing Cai
Xu Qi
Pengyu Wang
Dong Zhang
Hang Song
Botian Jiang
Zhida Huang
Tao Wang
AIFinLRM
108
5
0
05 Aug 2024
Fine-gained Zero-shot Video Sampling
Fine-gained Zero-shot Video Sampling
Dengsheng Chen
Jie Hu
Javier Segovia-Aguas
Enhua Wu
VGenDiffM
55
0
0
31 Jul 2024
Benchmarking AIGC Video Quality Assessment: A Dataset and Unified Model
Benchmarking AIGC Video Quality Assessment: A Dataset and Unified Model
Zhichao Zhang
Xinyue Li
Wei Sun
Jun Jia
Xiongkuo Min
...
Puyi Wang
Zhongpeng Ji
Fengyu Sun
Shangling Jui
Guangtao Zhai
EGVM
68
5
0
31 Jul 2024
FreeLong: Training-Free Long Video Generation with SpectralBlend
  Temporal Attention
FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention
Yu Lu
Yuanzhi Liang
Linchao Zhu
Yi Yang
DiffMVGen
116
32
0
29 Jul 2024
Fréchet Video Motion Distance: A Metric for Evaluating Motion
  Consistency in Videos
Fréchet Video Motion Distance: A Metric for Evaluating Motion Consistency in Videos
Jiahe Liu
Youran Qu
Qi Yan
Fangyin Wei
Lele Wang
Renjie Liao
VGenEGVM
69
15
0
23 Jul 2024
Animate3D: Animating Any 3D Model with Multi-view Video Diffusion
Animate3D: Animating Any 3D Model with Multi-view Video Diffusion
Yanqin Jiang
Chaohui Yu
Chenjie Cao
Fan Wang
Weiming Hu
Jin Gao
VGen
80
19
0
16 Jul 2024
MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image
  Synthesis
MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
Wanggui He
Siming Fu
Mushui Liu
Xierui Wang
Wenyi Xiao
...
Zhelun Yu
Haoyuan Li
Ziwei Huang
Leilei Gan
Hao Jiang
DiffM
111
26
0
10 Jul 2024
Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators
Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators
Wentao Zhang
Junliang Guo
Tianyu He
Li Zhao
Linli Xu
Jiang Bian
120
4
0
10 Jul 2024
VIMI: Grounding Video Generation through Multi-modal Instruction
VIMI: Grounding Video Generation through Multi-modal Instruction
Yuwei Fang
Willi Menapace
Aliaksandr Siarohin
Tsai-Shien Chen
Kuan-Chien Wang
Ivan Skorokhodov
Graham Neubig
Sergey Tulyakov
VGen
161
2
0
08 Jul 2024
The Tug-of-War Between Deepfake Generation and Detection
The Tug-of-War Between Deepfake Generation and Detection
Hannah Lee
Changyeon Lee
Kevin Farhat
Lin Qiu
Steve Geluso
Aerin Kim
O. Etzioni
70
2
0
08 Jul 2024
GVDIFF: Grounded Text-to-Video Generation with Diffusion Models
GVDIFF: Grounded Text-to-Video Generation with Diffusion Models
Huanzhang Dou
Ruixiang Li
Wei Su
Xi Li
DiffM
92
1
0
02 Jul 2024
Identifying and Solving Conditional Image Leakage in Image-to-Video
  Diffusion Model
Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model
Min Zhao
Hongzhou Zhu
Chendong Xiang
Kaiwen Zheng
Chongxuan Li
Jun Zhu
120
11
0
22 Jun 2024
VideoScore: Building Automatic Metrics to Simulate Fine-grained Human
  Feedback for Video Generation
VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Xuan He
Dongfu Jiang
Ge Zhang
Max Ku
Achint Soni
...
Yaswanth Narsupalli
Rongqi Fan
Zhiheng Lyu
Yuchen Lin
Wenhu Chen
EGVMVGenALM
136
56
0
21 Jun 2024
Training-free Camera Control for Video Generation
Training-free Camera Control for Video Generation
Chen Hou
Guoqiang Wei
VGenDiffM
201
40
0
14 Jun 2024
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Junke Wang
Yi Jiang
Zehuan Yuan
Binyue Peng
Zuxuan Wu
Yu-Gang Jiang
ViTVGen
124
46
0
13 Jun 2024
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing
  Reliability,Reproducibility, and Practicality
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality
Tianle Zhang
Langtian Ma
Yuchen Yan
Yuchen Zhang
Kai Wang
...
Wenqi Shao
Yang You
Yu Qiao
Ping Luo
Kaipeng Zhang
VGen
147
2
0
13 Jun 2024
Vivid-ZOO: Multi-View Video Generation with Diffusion Model
Vivid-ZOO: Multi-View Video Generation with Diffusion Model
Bing Li
Cheng Zheng
Wenxuan Zhu
Jinjie Mai
Biao Zhang
Peter Wonka
Bernard Ghanem
108
17
0
12 Jun 2024
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and
  Image-to-Video Generation
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation
Weixi Feng
Jiachen Li
Michael Stephen Saxon
Tsu-Jui Fu
Wenhu Chen
William Yang Wang
EGVMVGen
75
10
0
12 Jun 2024
Hierarchical Patch Diffusion Models for High-Resolution Video Generation
Hierarchical Patch Diffusion Models for High-Resolution Video Generation
Ivan Skorokhodov
Willi Menapace
Aliaksandr Siarohin
Sergey Tulyakov
VGen
79
10
0
12 Jun 2024
AID: Adapting Image2Video Diffusion Models for Instruction-guided Video
  Prediction
AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction
Zhen Xing
Qi Dai
Zejia Weng
Zuxuan Wu
Yu-Gang Jiang
VGen
132
14
0
10 Jun 2024
GAIA: Rethinking Action Quality Assessment for AI-Generated Videos
GAIA: Rethinking Action Quality Assessment for AI-Generated Videos
Zijian Chen
Wei Sun
Yuan Tian
Jun Jia
Zicheng Zhang
Jiarui Wang
Ru Huang
Xiongkuo Min
Guangtao Zhai
Wenjun Zhang
EGVM
121
15
0
10 Jun 2024
ProcessPainter: Learn Painting Process from Sequence Data
ProcessPainter: Learn Painting Process from Sequence Data
Yiren Song
Shijie Huang
Chen Yao
Xiaojun Ye
Hai Ci
Jiaming Liu
Yuxuan Zhang
Mike Zheng Shou
DiffM
72
11
0
10 Jun 2024
FRAG: Frequency Adapting Group for Diffusion Video Editing
FRAG: Frequency Adapting Group for Diffusion Video Editing
Sunjae Yoon
Gwanhyeong Koo
Geonwoo Kim
Chang D. Yoo
DiffM
118
5
0
10 Jun 2024
Zero-Shot Video Editing through Adaptive Sliding Score Distillation
Zero-Shot Video Editing through Adaptive Sliding Score Distillation
Lianghan Zhu
Yanqi Bao
Jing Huo
Jing Wu
Yu-Kun Lai
Wenbin Li
Yang Gao
VGen
67
2
0
07 Jun 2024
Coherent Zero-Shot Visual Instruction Generation
Coherent Zero-Shot Visual Instruction Generation
Quynh Phung
Songwei Ge
Jia-Bin Huang
84
2
0
06 Jun 2024
ShareGPT4Video: Improving Video Understanding and Generation with Better
  Captions
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Lin Chen
Xilin Wei
Jinsong Li
Xiaoyi Dong
Pan Zhang
...
Li Yuan
Yu Qiao
Dahua Lin
Feng Zhao
Jiaqi Wang
143
183
0
06 Jun 2024
VideoPhy: Evaluating Physical Commonsense for Video Generation
VideoPhy: Evaluating Physical Commonsense for Video Generation
Hritik Bansal
Zongyu Lin
Tianyi Xie
Zeshun Zong
Michal Yarom
Yonatan Bitton
Chenfanfu Jiang
Ningyu Zhang
Kai-Wei Chang
Aditya Grover
EGVMVGen
112
45
0
05 Jun 2024
Searching Priors Makes Text-to-Video Synthesis Better
Searching Priors Makes Text-to-Video Synthesis Better
Haoran Cheng
Liang Peng
Linxuan Xia
Yuepeng Hu
Hengjia Li
Qinglin Lu
Xiaofei He
Boxi Wu
VGenDiffM
47
0
0
05 Jun 2024
U-KAN Makes Strong Backbone for Medical Image Segmentation and
  Generation
U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation
Chenxin Li
Xinyu Liu
Wenbo Li
Cheng Wang
Hengyu Liu
Yifan Liu
Zhen Chen
Yixuan Yuan
MedImDiffMSSeg
143
145
0
05 Jun 2024
UniAnimate: Taming Unified Video Diffusion Models for Consistent Human
  Image Animation
UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation
Xiang Wang
Shiwei Zhang
Changxin Gao
Jiayu Wang
Xiaoqiang Zhou
Yingya Zhang
Luxin Yan
Nong Sang
VGen
141
41
0
03 Jun 2024
AudioLCM: Text-to-Audio Generation with Latent Consistency Models
AudioLCM: Text-to-Audio Generation with Latent Consistency Models
Huadai Liu
Rongjie Huang
Yang Liu
Hengyuan Cao
Jialei Wang
Xize Cheng
Siqi Zheng
Zhou Zhao
127
10
0
01 Jun 2024
MotionFollower: Editing Video Motion via Lightweight Score-Guided
  Diffusion
MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion
Shuyuan Tu
Qi Dai
Zihao Zhang
Sicheng Xie
Zhi-Qi Cheng
Chong Luo
Xintong Han
Zuxuan Wu
Yu-Gang Jiang
DiffMVGen
75
11
0
30 May 2024
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo
  Benchmark
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark
Haoxing Chen
Yan Hong
Zizheng Huang
Zhuoer Xu
Zhangxuan Gu
...
Jun Lan
Huijia Zhu
Jianfu Zhang
Weiqiang Wang
Huaxiong Li
Mamba
164
21
0
30 May 2024
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model
  with Mixed Reward Feedback
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback
Jiachen Li
Weixi Feng
Tsu-Jui Fu
Xinyi Wang
Sugato Basu
Wenhu Chen
William Y. Wang
VGen
91
34
0
29 May 2024
VividPose: Advancing Stable Video Diffusion for Realistic Human Image
  Animation
VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation
Qilin Wang
Zhengkai Jiang
Chengming Xu
Jiangning Zhang
Yabiao Wang
Xinyi Zhang
Yunkang Cao
Weijian Cao
Chengjie Wang
Yanwei Fu
VGen
90
15
0
28 May 2024
C3LLM: Conditional Multimodal Content Generation Using Large Language
  Models
C3LLM: Conditional Multimodal Content Generation Using Large Language Models
Zixuan Wang
Qinkai Duan
Yu-Wing Tai
Chi-Keung Tang
116
3
0
25 May 2024
Text Prompting for Multi-Concept Video Customization by Autoregressive
  Generation
Text Prompting for Multi-Concept Video Customization by Autoregressive Generation
D. Kothandaraman
Kihyuk Sohn
Ruben Villegas
P. Voigtlaender
Dinesh Manocha
Mohammad Babaeizadeh
VGenDiffM
59
2
0
22 May 2024
DisenStudio: Customized Multi-subject Text-to-Video Generation with
  Disentangled Spatial Control
DisenStudio: Customized Multi-subject Text-to-Video Generation with Disentangled Spatial Control
Hong Chen
Xin Wang
Yipeng Zhang
Yuwei Zhou
Zeyang Zhang
Siao Tang
Wenwu Zhu
VGenDiffM
75
10
0
21 May 2024
FIFO-Diffusion: Generating Infinite Videos from Text without Training
FIFO-Diffusion: Generating Infinite Videos from Text without Training
Jihwan Kim
Junoh Kang
Jinyoung Choi
Bohyung Han
DiffMVGen
111
36
0
19 May 2024
From Sora What We Can See: A Survey of Text-to-Video Generation
From Sora What We Can See: A Survey of Text-to-Video Generation
Rui Sun
Yumin Zhang
Tejal Shah
Jiahao Sun
Shuoying Zhang
Wenqi Li
Haoran Duan
Bo Wei
R. Ranjan
EGVM
131
20
0
17 May 2024
The Lost Melody: Empirical Observations on Text-to-Video Generation From
  A Storytelling Perspective
The Lost Melody: Empirical Observations on Text-to-Video Generation From A Storytelling Perspective
Andrew Shin
Yusuke Mori
Kunitake Kaneko
VGenEGVM
51
2
0
13 May 2024
TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation
TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation
Hritik Bansal
Yonatan Bitton
Michal Yarom
Idan Szpektor
Aditya Grover
Kai-Wei Chang
DiffM
109
12
0
07 May 2024
Is Sora a World Simulator? A Comprehensive Survey on General World
  Models and Beyond
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
Zheng Zhu
Xiaofeng Wang
Wangbo Zhao
Chen Min
Nianchen Deng
...
Dawei Zhao
Liang Xiao
Jian-jun Zhao
Jiwen Lu
Guan Huang
VGenLM&Ro
176
48
0
06 May 2024
Video Diffusion Models: A Survey
Video Diffusion Models: A Survey
Andrew Melnik
Michal Ljubljanac
Cong Lu
Qi Yan
Weiming Ren
Helge J. Ritter
VGen
145
16
0
06 May 2024
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion
  Models
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models
Haomiao Ni
Bernhard Egger
Suhas Lohit
A. Cherian
Ye Wang
T. Koike-Akino
S. X. Huang
Tim K. Marks
DiffM
81
15
0
25 Apr 2024
Beyond Deepfake Images: Detecting AI-Generated Videos
Beyond Deepfake Images: Detecting AI-Generated Videos
Danial Samadi Vahdati
Tai D. Nguyen
Aref Azizpour
Matthew C. Stamm
114
16
0
24 Apr 2024
Previous
123456...8910
Next