Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2506.13691
Cited By
UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions
16 June 2025
Zhucun Xue
Jiangning Zhang
T. Hu
Haoyang He
Yinan Chen
Yuxuan Cai
Yabiao Wang
Chengjie Wang
Yong-Jin Liu
Xiangtai Li
Dacheng Tao
VGen
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions"
14 / 14 papers shown
Title
MAGI-1: Autoregressive Video Generation at Scale
Sand.ai
Hansi Teng
Hongyu Jia
Lei Sun
Lingzhi Li
...
Xiaoyu Yang
Xin Song
Xuan Hu
Y. Zhang
Yuqiao Li
DiffM
VGen
VLM
27
8
0
19 May 2025
Qwen2.5-VL Technical Report
S. Bai
Keqin Chen
Xuejing Liu
Jialin Wang
Wenbin Ge
...
Zesen Cheng
Hang Zhang
Zhibo Yang
Haiyang Xu
Junyang Lin
VLM
327
685
0
20 Feb 2025
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model
Guoqing Ma
Haoyang Huang
K. Yan
L. Chen
Nan Duan
...
Yansen Wang
Yuanwei Lu
Yu-Cheng Chen
Yu-Juan Luo
Yihao Luo
DiffM
VGen
276
40
0
14 Feb 2025
Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content
Qiuheng Wang
Yukai Shi
Jiarong Ou
Ruoxin Chen
Ke Lin
...
Mingwu Zheng
Xin Tao
Fei Yang
Pengfei Wan
Di Zhang
VGen
137
33
0
10 Oct 2024
Loong: Generating Minute-level Long Videos with Autoregressive Language Models
Yuqing Wang
Tianwei Xiong
Daquan Zhou
Zhijie Lin
Yang Zhao
Bingyi Kang
Jiashi Feng
Xihui Liu
VGen
136
34
0
03 Oct 2024
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Zhuoyi Yang
Jiayan Teng
Wendi Zheng
Ming Ding
Shiyu Huang
...
Weihan Wang
Yean Cheng
Xiaotao Gu
Yuxiao Dong
Jie Tang
DiffM
VGen
233
557
0
12 Aug 2024
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
Kepan Nan
Rui Xie
Penghao Zhou
Tiehan Fan
Zhenheng Yang
Zhijie Chen
Xiang Li
Jian Yang
Ying Tai
135
92
0
02 Jul 2024
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
Jay Zhangjie Wu
Yixiao Ge
Xintao Wang
Weixian Lei
Yuchao Gu
Yufei Shi
Wynne Hsu
Ying Shan
Xiaohu Qie
Mike Zheng Shou
VGen
119
743
0
22 Dec 2022
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
474
15,734
0
20 Dec 2021
Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Hongwei Xue
Tiankai Hang
Yanhong Zeng
Yuchong Sun
Bei Liu
Huan Yang
Jianlong Fu
B. Guo
AI4TS
VLM
70
194
0
19 Nov 2021
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
VLM
MLLM
CLIP
243
1,441
0
03 Nov 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
967
29,810
0
26 Feb 2021
RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
Zachary Teed
Jia Deng
MDE
244
2,644
0
26 Mar 2020
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
118
1,207
0
07 Jun 2019
1