68
0

V2V: Scaling Event-Based Vision through Efficient Video-to-Voxel Simulation

Main:8 Pages
18 Figures
Bibliography:3 Pages
8 Tables
Appendix:12 Pages
Abstract

Event-based cameras offer unique advantages such as high temporal resolution, high dynamic range, and low power consumption. However, the massive storage requirements and I/O burdens of existing synthetic data generation pipelines and the scarcity of real data prevent event-based training datasets from scaling up, limiting the development and generalization capabilities of event vision models. To address this challenge, we introduce Video-to-Voxel (V2V), an approach that directly converts conventional video frames into event-based voxel grid representations, bypassing the storage-intensive event stream generation entirely. V2V enables a 150 times reduction in storage requirements while supporting on-the-fly parameter randomization for enhanced model robustness. Leveraging this efficiency, we train several video reconstruction and optical flow estimation model architectures on 10,000 diverse videos totaling 52 hours--an order of magnitude larger than existing event datasets, yielding substantial improvements.

View on arXiv
@article{lou2025_2505.16797,
  title={ V2V: Scaling Event-Based Vision through Efficient Video-to-Voxel Simulation },
  author={ Hanyue Lou and Jinxiu Liang and Minggui Teng and Yi Wang and Boxin Shi },
  journal={arXiv preprint arXiv:2505.16797},
  year={ 2025 }
}
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.