How Real is CARLAs Dynamic Vision Sensor? A Study on the Sim-to-Real Gap in Traffic Object Detection

Event cameras are gaining traction in traffic monitoring applications due to their low latency, high temporal resolution, and energy efficiency, which makes them well-suited for real-time object detection at traffic intersections. However, the development of robust event-based detection models is hindered by the limited availability of annotated real-world datasets. To address this, several simulation tools have been developed to generate synthetic event data. Among these, the CARLA driving simulator includes a built-in dynamic vision sensor (DVS) module that emulates event camera output. Despite its potential, the sim-to-real gap for event-based object detection remains insufficiently studied. In this work, we present a systematic evaluation of this gap by training a recurrent vision transformer model exclusively on synthetic data generated using CARLAs DVS and testing it on varying combinations of synthetic and real-world event streams. Our experiments show that models trained solely on synthetic data perform well on synthetic-heavy test sets but suffer significant performance degradation as the proportion of real-world data increases. In contrast, models trained on real-world data demonstrate stronger generalization across domains. This study offers the first quantifiable analysis of the sim-to-real gap in event-based object detection using CARLAs DVS. Our findings highlight limitations in current DVS simulation fidelity and underscore the need for improved domain adaptation techniques in neuromorphic vision for traffic monitoring.
View on arXiv@article{tan2025_2506.13722, title={ How Real is CARLAs Dynamic Vision Sensor? A Study on the Sim-to-Real Gap in Traffic Object Detection }, author={ Kaiyuan Tan and Pavan Kumar B N and Bharatesh Chakravarthi }, journal={arXiv preprint arXiv:2506.13722}, year={ 2025 } }