ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.00244
23
0

Enhanced Structured State Space Models via Grouped FIR Filtering and Attention Sink Mechanisms

1 August 2024
Yueran Zhang
Yating Yu
Lingtong Min
    Mamba
ArXivPDFHTML
Abstract

Structured State Space Models (SSMs) have emerged as compelling alternatives to Transformer architectures, offering linear-time complexity and superior performance in various sequence modeling tasks. Despite their advantages, SSMs like the original Mamba-2 face training difficulties due to the sensitivities introduced by the extended series of recurrent matrix multiplications. In this paper, we propose an advanced architecture that mitigates these challenges by decomposing A-multiplications into multiple groups and optimizing positional encoding through Grouped Finite Impulse Response (FIR) filtering. This new structure, denoted as Grouped FIR-enhanced SSM (GFSSM), employs semiseparable matrices for efficient computation. Furthermore, inspired by the "attention sink" phenomenon identified in streaming language models, we incorporate a similar mechanism to enhance the stability and performance of our model over extended sequences. Our approach further bridges the gap between SSMs and Transformer architectures, offering a viable path forward for scalable and high-performing sequence modeling.

View on arXiv
Comments on this paper