RefAV: Towards Planning-Centric Scenario Mining

27 May 2025

Cainan Davidson

Deva Ramanan

Neehar Peri

ArXiv (abs)PDF HTML

Main:9 Pages

7 Figures

Bibliography:5 Pages

6 Tables

Appendix:17 Pages

Abstract

Autonomous Vehicles (AVs) collect and pseudo-label terabytes of multi-modal data localized to HD maps during normal fleet testing. However, identifying interesting and safety-critical scenarios from uncurated driving logs remains a significant challenge. Traditional scenario mining techniques are error-prone and prohibitively time-consuming, often relying on hand-crafted structured queries. In this work, we revisit spatio-temporal scenario mining through the lens of recent vision-language models (VLMs) to detect whether a described scenario occurs in a driving log and, if so, precisely localize it in both time and space. To address this problem, we introduce RefAV, a large-scale dataset of 10,000 diverse natural language queries that describe complex multi-agent interactions relevant to motion planning derived from 1000 driving logs in the Argoverse 2 Sensor dataset. We evaluate several referential multi-object trackers and present an empirical analysis of our baselines. Notably, we find that naively repurposing off-the-shelf VLMs yields poor performance, suggesting that scenario mining presents unique challenges. Our code and dataset are available at this https URL and this https URL

View on arXiv

@article{davidson2025_2505.20981,
  title={ RefAV: Towards Planning-Centric Scenario Mining },
  author={ Cainan Davidson and Deva Ramanan and Neehar Peri },
  journal={arXiv preprint arXiv:2505.20981},
  year={ 2025 }
}

Comments on this paper