ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.01963
28
0

Breaking Quadratic Barriers: A Non-Attention LLM for Ultra-Long Context Horizons

9 May 2025
Andrew Kiruluta
Preethi Raju
Priscilla Burity
ArXiv (abs)PDFHTML
Main:6 Pages
1 Figures
2 Tables
Appendix:30 Pages
Abstract

We present a novel non attention based architecture for large language models (LLMs) that efficiently handles very long context windows, on the order of hundreds of thousands to potentially millions of tokens. Unlike traditional Transformer designs, which suffer from quadratic memory and computation overload due to the nature of the self attention mechanism, our model avoids token to token attention entirely. Instead, it combines the following complementary components: State Space blocks (inspired by S4) that learn continuous time convolution kernels and scale near linearly with sequence length, Multi Resolution Convolution layers that capture local context at different dilation levels, a lightweight Recurrent Supervisor to maintain a global hidden state across sequential chunks, and Retrieval Augmented External Memory that stores and retrieves high-level chunk embeddings without reintroducing quadratic operations.

View on arXiv
@article{kiruluta2025_2506.01963,
  title={ Breaking Quadratic Barriers: A Non-Attention LLM for Ultra-Long Context Horizons },
  author={ Andrew Kiruluta and Preethi Raju and Priscilla Burity },
  journal={arXiv preprint arXiv:2506.01963},
  year={ 2025 }
}
Comments on this paper