ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.00744
20
0

Blending Complementary Memory Systems in Hybrid Quadratic-Linear Transformers

31 May 2025
Kazuki Irie
Morris Yau
Samuel J. Gershman
ArXiv (abs)PDFHTML
Main:10 Pages
2 Figures
Bibliography:5 Pages
10 Tables
Appendix:4 Pages
Abstract

We develop hybrid memory architectures for general-purpose sequence processing neural networks, that combine key-value memory using softmax attention (KV-memory) with dynamic synaptic memory through fast-weight programming (FW-memory) -- the core principles of quadratic and linear transformers, respectively. These two memory systems have complementary but individually limited properties: KV-memory offers precise retrieval but is constrained by quadratic complexity in sequence length, while FW-memory supports arbitrarily long sequences and enables more expressive computation but sacrifices precise recall. We propose and compare three methods to blend these two systems into a single memory system to leverage the strengths of both. We conduct experiments on general language modeling and retrieval tasks by training 340M- and 1.3B-parameter models from scratch, as well as on synthetic algorithmic tasks designed to precisely illustrate the benefits of certain hybrid methods over others. We also evaluate our hybrid memory systems on reinforcement learning in partially observable environments. Overall, we demonstrate how a well-designed hybrid can overcome the limitations of its individual components, offering new insights into the design principle of neural memory systems.

View on arXiv
@article{irie2025_2506.00744,
  title={ Blending Complementary Memory Systems in Hybrid Quadratic-Linear Transformers },
  author={ Kazuki Irie and Morris Yau and Samuel J. Gershman },
  journal={arXiv preprint arXiv:2506.00744},
  year={ 2025 }
}
Comments on this paper