16
0

TransDiffuser: End-to-end Trajectory Generation with Decorrelated Multi-modal Representation for Autonomous Driving

Abstract

In recent years, diffusion model has shown its potential across diverse domains from vision generation to language modeling. Transferring its capabilities to modern autonomous driving systems has also emerged as a promisingthis http URLthis work, we propose TransDiffuser, an encoder-decoder based generative trajectory planning model for end-to-end autonomous driving. The encoded scene information serves as the multi-modal conditional input of the denoising decoder. To tackle the mode collapse dilemma in generating high-quality diverse trajectories, we introduce a simple yet effective multi-modal representation decorrelation optimization mechanism during the trainingthis http URLachieves PDMS of 94.85 on the NAVSIM benchmark, surpassing previous state-of-the-art methods without any anchor-based prior trajectories.

View on arXiv
@article{jiang2025_2505.09315,
  title={ TransDiffuser: End-to-end Trajectory Generation with Decorrelated Multi-modal Representation for Autonomous Driving },
  author={ Xuefeng Jiang and Yuan Ma and Pengxiang Li and Leimeng Xu and Xin Wen and Kun Zhan and Zhongpu Xia and Peng Jia and XianPeng Lang and Sheng Sun },
  journal={arXiv preprint arXiv:2505.09315},
  year={ 2025 }
}
Comments on this paper