ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.23757
39
0

Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models

29 May 2025
Haohan Chi
Huan-ang Gao
Ziming Liu
Jianing Liu
Chenyu Liu
Jinwei Li
Kaisen Yang
Yangcheng Yu
Zeda Wang
Wenyi Li
Leichen Wang
Xingtao Hu
Hao Sun
Hang Zhao
Hao Zhao
    VLM
ArXivPDFHTML
Abstract

Vision-Language-Action (VLA) models for autonomous driving show promise but falter in unstructured corner case scenarios, largely due to a scarcity of targeted benchmarks. To address this, we introduce Impromptu VLA. Our core contribution is the Impromptu VLA Dataset: over 80,000 meticulously curated video clips, distilled from over 2M source clips sourced from 8 open-source large-scale datasets. This dataset is built upon our novel taxonomy of four challenging unstructured categories and features rich, planning-oriented question-answering annotations and action trajectories. Crucially, experiments demonstrate that VLAs trained with our dataset achieve substantial performance gains on established benchmarks--improving closed-loop NeuroNCAP scores and collision rates, and reaching near state-of-the-art L2 accuracy in open-loop nuScenes trajectory prediction. Furthermore, our Q&A suite serves as an effective diagnostic, revealing clear VLM improvements in perception, prediction, and planning. Our code, data and models are available atthis https URL.

View on arXiv
@article{chi2025_2505.23757,
  title={ Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models },
  author={ Haohan Chi and Huan-ang Gao and Ziming Liu and Jianing Liu and Chenyu Liu and Jinwei Li and Kaisen Yang and Yangcheng Yu and Zeda Wang and Wenyi Li and Leichen Wang and Xingtao Hu and Hao Sun and Hang Zhao and Hao Zhao },
  journal={arXiv preprint arXiv:2505.23757},
  year={ 2025 }
}
Comments on this paper