MambaBEV: An efficient 3D detection model with Mamba2

Accurate 3D object detection in autonomous driving relies on Bird's Eye View (BEV) perception and effective temporalthis http URL, existing fusion strategies based on convolutional layers or deformable self attention struggle with global context modeling in BEV space,leading to lower accuracy for large objects. To address this, we introduce MambaBEV, a novel BEV based 3D object detection model that leverages Mamba2, an advanced state space model (SSM) optimized for long sequencethis http URLkey contribution is TemporalMamba, a temporal fusion module that enhances global awareness by introducing a BEV feature discrete rearrangement mechanism tailored for Mamba's sequential processing. Additionally, we propose Mamba based DETR as the detection head to improve multi objectthis http URLon the nuScenes dataset demonstrate that MambaBEV base achieves an NDS of 51.7\% and an mAP of 42.7\%.Furthermore, an end to end autonomous driving paradigm validates its effectiveness in motion forecasting andthis http URLresults highlight the potential of SSMs in autonomous driving perception, particularly in enhancing global context understanding and large object detection.
View on arXiv@article{you2025_2410.12673, title={ MambaBEV: An efficient 3D detection model with Mamba2 }, author={ Zihan You and Ni Wang and Hao Wang and Qichao Zhao and Jinxiang Wang }, journal={arXiv preprint arXiv:2410.12673}, year={ 2025 } }