Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.06358
Cited By
MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions
8 July 2024
Xuan Ju
Yiming Gao
Zhaoyang Zhang
Ziyang Yuan
Xintao Wang
Ailing Zeng
Yu Xiong
Qiang Xu
Ying Shan
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"
24 / 24 papers shown
Title
Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning in The DCASE 2025 Challenge
Chao-Han Huck Yang
Sreyan Ghosh
Qing Wang
Jaeyeon Kim
Hengyi Hong
...
Tianyi Zhou
Gunhee Kim
Jun Du
Rafael Valle
Bryan Catanzaro
33
0
0
12 May 2025
FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios
Shiyi Zhang
Junhao Zhuang
Zhaoyang Zhang
Ying Shan
Yansong Tang
VGen
107
0
0
06 May 2025
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Enxin Song
Wenhao Chai
Weili Xu
Jianwen Xie
Yuxuan Liu
Gaoang Wang
62
0
0
20 Apr 2025
RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements
Guangcong Zheng
Teng Li
Xianpan Zhou
Xi Li
VGen
3DV
69
1
0
11 Apr 2025
AdaWorld: Learning Adaptable World Models with Latent Actions
Shenyuan Gao
Siyuan Zhou
Yilun Du
Jun Zhang
Chuang Gan
VGen
62
3
0
24 Mar 2025
DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation
Runze Zhang
Guoguang Du
Xiaochuan Li
Qi Jia
Liang Jin
...
Zhenhua Guo
Yaqian Zhao
Xiaoli Gong
Rengang Li
Baoyu Fan
VGen
73
0
0
08 Mar 2025
VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation
Wenhao Wang
Yuqing Yang
DiffM
VGen
84
0
0
03 Mar 2025
VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation
Xinlong Chen
Yang Zhang
Chongling Rao
Yushuo Guan
Jiaheng Liu
Fuzheng Zhang
Chengru Song
Qiang Liu
Di Zhang
Tieniu Tan
7
0
0
18 Feb 2025
Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation
Yang Cao
Zhao-quan Song
Chiwun Yang
VGen
46
2
0
01 Feb 2025
Can video generation replace cinematographers? Research on the cinematic language of generated video
Xiaomeng Li
Kai WU
Siyi Yang
YiZhan Qu
Guohua. Zhang
...
Mingliang Xiong
Hao Deng
Qingwen Liu
Gang Li
Bin He
VGen
DiffM
90
1
0
16 Dec 2024
Mojito: Motion Trajectory and Intensity Control for Video Generation
Xuehai He
Shuohang Wang
Jianwei Yang
Xiaoxia Wu
Yixuan Wang
Kuan-Chieh Jackson Wang
Z. Zhan
Olatunji Ruwase
Yelong Shen
Qing Guo
VGen
86
1
0
12 Dec 2024
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
Weijia Wu
Mingyu Liu
Zeyu Zhu
Xi Xia
Haoen Feng
Wen Wang
Kevin Qinghong Lin
Chunhua Shen
Mike Zheng Shou
DiffM
VGen
119
1
0
22 Nov 2024
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
Wenhao Wang
Yuqing Yang
VGen
47
3
0
05 Nov 2024
ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
Zongyi Li
Shujie Hu
Shujie Liu
Long Zhou
Jeongsoo Choi
Lingwei Meng
Xun Guo
J. Li
H. Ling
Furu Wei
VGen
DiffM
77
5
0
27 Oct 2024
Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content
Qiuheng Wang
Yukai Shi
Jiarong Ou
R. J. Chen
Ke Lin
...
Mingwu Zheng
Xin Tao
Fei Yang
Pengfei Wan
Di Zhang
VGen
88
18
0
10 Oct 2024
T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design
Jiachen Li
Qian Long
Jian Zheng
Xiaofeng Gao
Robinson Piramuthu
Wenhu Chen
William Yang Wang
VGen
29
22
0
08 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
84
25
0
04 Oct 2024
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen
Aliaksandr Siarohin
Willi Menapace
Ekaterina Deyneka
Hsiang-wei Chao
...
Yuwei Fang
Hsin-Ying Lee
Jian Ren
Ming-Hsuan Yang
Sergey Tulyakov
VGen
86
178
0
29 Feb 2024
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Haoxin Chen
Yong Zhang
Xiaodong Cun
Menghan Xia
Xintao Wang
Chao-Liang Weng
Ying Shan
VGen
DiffM
120
275
0
17 Jan 2024
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
A. Blattmann
Tim Dockhorn
Sumith Kulal
Daniel Mendelevitch
Maciej Kilian
...
Zion English
Vikram S. Voleti
Adam Letts
Varun Jampani
Robin Rombach
VGen
158
1,016
0
25 Nov 2023
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin
Yang Ye
Bin Zhu
Jiaxi Cui
Munan Ning
Peng Jin
Li-ming Yuan
VLM
MLLM
194
591
0
16 Nov 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
272
4,244
0
30 Jan 2023
MUSIQ: Multi-scale Image Quality Transformer
Junjie Ke
Qifei Wang
Yilin Wang
P. Milanfar
Feng Yang
168
624
0
12 Aug 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
304
3,708
0
11 Feb 2021
1