V2X-UniPool: Unifying Multimodal Perception and Knowledge Reasoning for Autonomous Driving

3 June 2025

Main:8 Pages

2 Figures

Bibliography:4 Pages

3 Tables

Abstract

Knowledge-driven autonomous driving systems(ADs) offer powerful reasoning capabilities, but face two critical challenges: limited perception due to the short-sightedness of single-vehicle sensors, and hallucination arising from the lack of real-time environmental grounding. To address these issues, this paper introduces V2X-UniPool, a unified framework that integrates multimodal Vehicle-to-Everything (V2X) data into a time-indexed and language-based knowledge pool. By leveraging a dual-query Retrieval-Augmented Generation (RAG) mechanism, which enables retrieval of both static and dynamic knowledge, our system enables ADs to perform accurate, temporally consistent reasoning over both static environment and dynamic traffic context. Experiments on a real-world cooperative driving dataset demonstrate that V2X-UniPool significantly enhances motion planning accuracy and reasoning capability. Remarkably, it enables even zero-shot vehicle-side models to achieve state-of-the-art performance by leveraging V2X-UniPool, while simultaneously reducing transmission cost by over 99.9\% compared to prior V2X methods.

View on arXiv

@article{luo2025_2506.02580,
  title={ V2X-UniPool: Unifying Multimodal Perception and Knowledge Reasoning for Autonomous Driving },
  author={ Xuewen Luo and Fengze Yang and Fan Ding and Xiangbo Gao and Shuo Xing and Yang Zhou and Zhengzhong Tu and Chenxi Liu },
  journal={arXiv preprint arXiv:2506.02580},
  year={ 2025 }
}

Comments on this paper