ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.16805
68
0

SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving

22 May 2025
Xuesong Chen
Linjiang Huang
Tao Ma
Rongyao Fang
Shaoshuai Shi
Hongsheng Li
ArXiv (abs)PDFHTML
Main:8 Pages
5 Figures
Bibliography:2 Pages
5 Tables
Abstract

The integration of Vision-Language Models (VLMs) into autonomous driving systems has shown promise in addressing key challenges such as learning complexity, interpretability, and common-sense reasoning. However, existing approaches often struggle with efficient integration and realtime decision-making due to computational demands. In this paper, we introduce SOLVE, an innovative framework that synergizes VLMs with end-to-end (E2E) models to enhance autonomous vehicle planning. Our approach emphasizes knowledge sharing at the feature level through a shared visual encoder, enabling comprehensive interaction between VLM and E2E components. We propose a Trajectory Chain-of-Thought (T-CoT) paradigm, which progressively refines trajectory predictions, reducing uncertainty and improving accuracy. By employing a temporal decoupling strategy, SOLVE achieves efficient cooperation by aligning high-quality VLM outputs with E2E real-time performance. Evaluated on the nuScenes dataset, our method demonstrates significant improvements in trajectory prediction accuracy, paving the way for more robust and reliable autonomous driving systems.

View on arXiv
@article{chen2025_2505.16805,
  title={ SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving },
  author={ Xuesong Chen and Linjiang Huang and Tao Ma and Rongyao Fang and Shaoshuai Shi and Hongsheng Li },
  journal={arXiv preprint arXiv:2505.16805},
  year={ 2025 }
}
Comments on this paper