MambaVSR: Content-Aware Scanning State Space Model for Video Super-Resolution

13 June 2025

Main:5 Pages

5 Figures

Bibliography:3 Pages

4 Tables

Abstract

Video super-resolution (VSR) faces critical challenges in effectively modeling non-local dependencies across misaligned frames while preserving computational efficiency. Existing VSR methods typically rely on optical flow strategies or transformer architectures, which struggle with large motion displacements and long video sequences. To address this, we propose MambaVSR, the first state-space model framework for VSR that incorporates an innovative content-aware scanning mechanism. Unlike rigid 1D sequential processing in conventional vision Mamba methods, our MambaVSR enables dynamic spatiotemporal interactions through the Shared Compass Construction (SCC) and the Content-Aware Sequentialization (CAS). Specifically, the SCC module constructs intra-frame semantic connectivity graphs via efficient sparse attention and generates adaptive spatial scanning sequences through spectral clustering. Building upon SCC, the CAS module effectively aligns and aggregates non-local similar content across multiple frames by interleaving temporal features along the learned spatial order. To bridge global dependencies with local details, the Global-Local State Space Block (GLSSB) synergistically integrates window self-attention operations with SSM-based feature propagation, enabling high-frequency detail recovery under global dependency guidance. Extensive experiments validate MambaVSR's superiority, outperforming the Transformer-based method by 0.58 dB PSNR on the REDS dataset with 55% fewer parameters.

View on arXiv

@article{he2025_2506.11768,
  title={ MambaVSR: Content-Aware Scanning State Space Model for Video Super-Resolution },
  author={ Linfeng He and Meiqin Liu and Qi Tang and Chao Yao and Yao Zhao },
  journal={arXiv preprint arXiv:2506.11768},
  year={ 2025 }
}

Comments on this paper