118
1
v1v2 (latest)

X-Driver: Explainable Autonomous Driving with Vision-Language Models

Main:7 Pages
4 Figures
Bibliography:1 Pages
4 Tables
Abstract

End-to-end autonomous driving has advanced significantly, offering benefits such as system simplicity and stronger driving performance in both open-loop and closed-loop settings than conventional pipelines. However, existing frameworks still suffer from low success rates in closed-loop evaluations, highlighting their limitations in real-world deployment. In this paper, we introduce X-Driver, a unified multi-modal large language models(MLLMs) framework designed for closed-loop autonomous driving, leveraging Chain-of-Thought(CoT) and autoregressive modeling to enhance perception and decision-making. We validate X-Driver across multiple autonomous driving tasks using public benchmarks in CARLA simulation environment, including Bench2Drive[6]. Our experimental results demonstrate superior closed-loop performance, surpassing the current state-of-the-art(SOTA) while improving the interpretability of driving decisions. These findings underscore the importance of structured reasoning in end-to-end driving and establish X-Driver as a strong baseline for future research in closed-loop autonomous driving.

View on arXiv
@article{liu2025_2505.05098,
  title={ X-Driver: Explainable Autonomous Driving with Vision-Language Models },
  author={ Wei Liu and Jiyuan Zhang and Binxiong Zheng and Yufeng Hu and Yingzhan Lin and Zengfeng Zeng },
  journal={arXiv preprint arXiv:2505.05098},
  year={ 2025 }
}
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.