Robust Multimodal Segmentation with Representation Regularization and Hybrid Prototype Distillation

19 May 2025

Main:9 Pages

4 Figures

Bibliography:4 Pages

9 Tables

Appendix:2 Pages

Abstract

Multi-modal semantic segmentation (MMSS) faces significant challenges in real-world scenarios due to dynamic environments, sensor failures, and noise interference, creating a gap between theoretical models and practical performance. To address this, we propose a two-stage framework called RobustSeg, which enhances multi-modal robustness through two key components: the Hybrid Prototype Distillation Module (HPDM) and the Representation Regularization Module (RRM). In the first stage, RobustSeg pre-trains a multi-modal teacher model using complete modalities. In the second stage, a student model is trained with random modality dropout while learning from the teacher via HPDM and RRM. HPDM transforms features into compact prototypes, enabling cross-modal hybrid knowledge distillation and mitigating bias from missing modalities. RRM reduces representation discrepancies between the teacher and student by optimizing functional entropy through the log-Sobolev inequality. Extensive experiments on three public benchmarks demonstrate that RobustSeg outperforms previous state-of-the-art methods, achieving improvements of +2.76%, +4.56%, and +0.98%, respectively. Code is available at:this https URL.

View on arXiv

@article{tan2025_2505.12861,
  title={ Robust Multimodal Segmentation with Representation Regularization and Hybrid Prototype Distillation },
  author={ Jiaqi Tan and Xu Zheng and Yang Liu },
  journal={arXiv preprint arXiv:2505.12861},
  year={ 2025 }
}

Comments on this paper