10
0

Sekai: A Video Dataset towards World Exploration

Zhen Li
Chuanhao Li
Xiaofeng Mao
Shaoheng Lin
Ming Li
Shitian Zhao
Zhaopan Xu
Xinyue Li
Yukang Feng
Jianwen Sun
Zizhen Li
Fanrui Zhang
Jiaxin Ai
Zhixiang Wang
Yuwei Wu
Tong He
Jiangmiao Pang
Yu Qiao
Yunde Jia
Kaipeng Zhang
Main:8 Pages
6 Figures
Bibliography:4 Pages
Abstract

Video generation techniques have made remarkable progress, promising to be the foundation of interactive world exploration. However, existing video generation datasets are not well-suited for world exploration training as they suffer from some limitations: limited locations, short duration, static scenes, and a lack of annotations about exploration and the world. In this paper, we introduce Sekai (meaning ``world'' in Japanese), a high-quality first-person view worldwide video dataset with rich annotations for world exploration. It consists of over 5,000 hours of walking or drone view (FPV and UVA) videos from over 100 countries and regions across 750 cities. We develop an efficient and effective toolbox to collect, pre-process and annotate videos with location, scene, weather, crowd density, captions, and camera trajectories. Experiments demonstrate the quality of the dataset. And, we use a subset to train an interactive video world exploration model, named YUME (meaning ``dream'' in Japanese). We believe Sekai will benefit the area of video generation and world exploration, and motivate valuable applications.

View on arXiv
@article{li2025_2506.15675,
  title={ Sekai: A Video Dataset towards World Exploration },
  author={ Zhen Li and Chuanhao Li and Xiaofeng Mao and Shaoheng Lin and Ming Li and Shitian Zhao and Zhaopan Xu and Xinyue Li and Yukang Feng and Jianwen Sun and Zizhen Li and Fanrui Zhang and Jiaxin Ai and Zhixiang Wang and Yuwei Wu and Tong He and Jiangmiao Pang and Yu Qiao and Yunde Jia and Kaipeng Zhang },
  journal={arXiv preprint arXiv:2506.15675},
  year={ 2025 }
}
Comments on this paper