44
v1v2 (latest)

LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset

Royden Wagner
Omer Sahin Tas
Jaime Villa
Felix Hauser
Yinzhe Shen
Marlon Steiner
Dominik Strutz
Carlos Fernandez
Christian Kinzig
Guillermo S. Guitierrez-Cabello
Hendrik Königshof
Fabian Immel
Richard Schwarzkopf
Nils Alexander Rack
Kevin Rösch
Kaiwen Wang
Jan-Hendrik Pauls
Martin Lauer
Igor Gilitschenski
Holger Caesar
Christoph Stiller
Main:11 Pages
7 Figures
Bibliography:4 Pages
10 Tables
Appendix:6 Pages
Abstract

In real-world domains such as self-driving, generalization to rare scenarios remains a fundamental challenge. To address this, we introduce a new dataset designed for end-to-end driving that focuses on long-tail driving events. We provide multi-view video data, trajectories, high-level instructions, and detailed reasoning traces, facilitating in-context learning and few-shot generalization. The resulting benchmark for multimodal models, such as VLMs and VLAs, goes beyond safety and comfort metrics by evaluating instruction following and semantic coherence between model outputs. The multilingual reasoning traces in English, Spanish, and Chinese are from domain experts with diverse cultural backgrounds. Thus, our dataset is a unique resource for studying how different forms of reasoning affect driving competence. Our dataset is available at:this https URL

View on arXiv
Comments on this paper