ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.04271
  4. Cited By
Fundamental Limitations on Subquadratic Alternatives to Transformers
v1v2 (latest)

Fundamental Limitations on Subquadratic Alternatives to Transformers

5 October 2024
Josh Alman
Hantao Yu
ArXiv (abs)PDFHTML

Papers citing "Fundamental Limitations on Subquadratic Alternatives to Transformers"

42 / 42 papers shown
Title
Two Heads Are Better than One: Simulating Large Transformers with Small Ones
Two Heads Are Better than One: Simulating Large Transformers with Small Ones
Hantao Yu
Josh Alman
33
0
0
13 Jun 2025
SLICK: Selective Localization and Instance Calibration for Knowledge-Enhanced Car Damage Segmentation in Automotive Insurance
SLICK: Selective Localization and Instance Calibration for Knowledge-Enhanced Car Damage Segmentation in Automotive Insurance
Teerapong Panboonyuen
153
0
0
12 Jun 2025
Subquadratic Algorithms and Hardness for Attention with Any Temperature
Subquadratic Algorithms and Hardness for Attention with Any Temperature
Shreya Gupta
Boyang Huang
Barna Saha
Yinzhan Xu
Christopher Ye
66
2
0
20 May 2025
Compression Barriers for Autoregressive Transformers
Compression Barriers for Autoregressive Transformers
Themistoklis Haris
Krzysztof Onak
79
1
0
21 Feb 2025
Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models
Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models
Jerry Yao-Chieh Hu
Maojiang Su
En-Jui Kuo
Zhao Song
Han Liu
89
29
0
05 Jun 2024
Transformers, parallel computation, and logarithmic depth
Transformers, parallel computation, and logarithmic depth
Clayton Sanford
Daniel J. Hsu
Matus Telgarsky
75
42
0
14 Feb 2024
On Computational Limits of Modern Hopfield Models: A Fine-Grained
  Complexity Analysis
On Computational Limits of Modern Hopfield Models: A Fine-Grained Complexity Analysis
Jerry Yao-Chieh Hu
Thomas Lin
Zhao Song
Han Liu
98
41
0
07 Feb 2024
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu
Tri Dao
Mamba
168
2,839
0
01 Dec 2023
What Formal Languages Can Transformers Express? A Survey
What Formal Languages Can Transformers Express? A Survey
Lena Strobl
William Merrill
Gail Weiss
David Chiang
Dana Angluin
AI4CE
113
60
0
01 Nov 2023
The Expressive Power of Transformers with Chain of Thought
The Expressive Power of Transformers with Chain of Thought
William Merrill
Ashish Sabharwal
LRMAI4CEReLM
105
41
0
11 Oct 2023
HyperAttention: Long-context Attention in Near-Linear Time
HyperAttention: Long-context Attention in Near-Linear Time
Insu Han
Rajesh Jayaram
Amin Karbasi
Vahab Mirrokni
David P. Woodruff
A. Zandieh
116
74
0
09 Oct 2023
Representational Strengths and Limitations of Transformers
Representational Strengths and Limitations of Transformers
Clayton Sanford
Daniel J. Hsu
Matus Telgarsky
76
93
0
05 Jun 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAGMLLM
1.6K
14,852
0
15 Mar 2023
Fast Attention Requires Bounded Entries
Fast Attention Requires Bounded Entries
Josh Alman
Zhao Song
106
86
0
26 Feb 2023
KDEformer: Accelerating Transformers via Kernel Density Estimation
KDEformer: Accelerating Transformers via Kernel Density Estimation
A. Zandieh
Insu Han
Majid Daliri
Amin Karbasi
125
47
0
05 Feb 2023
Transformers Learn Shortcuts to Automata
Transformers Learn Shortcuts to Automata
Bingbin Liu
Jordan T. Ash
Surbhi Goel
A. Krishnamurthy
Cyril Zhang
OffRLLRM
161
178
0
19 Oct 2022
On The Computational Complexity of Self-Attention
On The Computational Complexity of Self-Attention
Feyza Duman Keles
Pruthuvi Maheshakya Wijewardena
Chinmay Hegde
144
130
0
11 Sep 2022
Formal Algorithms for Transformers
Formal Algorithms for Transformers
Mary Phuong
Marcus Hutter
62
75
0
19 Jul 2022
Formal Language Recognition by Hard Attention Transformers: Perspectives
  from Circuit Complexity
Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity
Sophie Hao
Dana Angluin
Robert Frank
85
80
0
13 Apr 2022
Overcoming a Theoretical Limitation of Self-Attention
Overcoming a Theoretical Limitation of Self-Attention
David Chiang
Peter A. Cholak
110
84
0
24 Feb 2022
Statistically Meaningful Approximation: a Case Study on Approximating
  Turing Machines with Transformers
Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers
Colin Wei
Yining Chen
Tengyu Ma
79
92
0
28 Jul 2021
Saturated Transformers are Constant-Depth Threshold Circuits
Saturated Transformers are Constant-Depth Threshold Circuits
William Merrill
Ashish Sabharwal
Noah A. Smith
118
107
0
30 Jun 2021
Self-Attention Networks Can Process Bounded Hierarchical Languages
Self-Attention Networks Can Process Bounded Hierarchical Languages
Shunyu Yao
Binghui Peng
Christos H. Papadimitriou
Karthik Narasimhan
89
83
0
24 May 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
770
41,877
0
22 Oct 2020
SMYRF: Efficient Attention using Asymmetric Clustering
SMYRF: Efficient Attention using Asymmetric Clustering
Giannis Daras
Nikita Kitaev
Augustus Odena
A. Dimakis
106
46
0
11 Oct 2020
Rethinking Attention with Performers
Rethinking Attention with Performers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
198
1,605
0
30 Sep 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear
  Attention
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
218
1,800
0
29 Jun 2020
Linformer: Self-Attention with Linear Complexity
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
247
1,720
0
08 Jun 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
1.2K
42,712
0
28 May 2020
End-to-End Object Detection with Transformers
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT3DVPINN
509
13,230
0
26 May 2020
Synthesizer: Rethinking Self-Attention in Transformer Models
Synthesizer: Rethinking Self-Attention in Transformer Models
Yi Tay
Dara Bahri
Donald Metzler
Da-Cheng Juan
Zhe Zhao
Che Zheng
76
342
0
02 May 2020
Longformer: The Long-Document Transformer
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALMVLM
230
4,109
0
10 Apr 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
405
606
0
12 Mar 2020
Statistical Analysis of Nearest Neighbor Methods for Anomaly Detection
Statistical Analysis of Nearest Neighbor Methods for Anomaly Detection
Xiaoyi Gu
Leman Akoglu
Alessandro Rinaldo
149
96
0
08 Jul 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Zhilin Yang
Zihang Dai
Yiming Yang
J. Carbonell
Ruslan Salakhutdinov
Quoc V. Le
AI4CE
372
8,464
0
19 Jun 2019
Theoretical Limitations of Self-Attention in Neural Sequence Models
Theoretical Limitations of Self-Attention in Neural Sequence Models
Michael Hahn
94
276
0
16 Jun 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
1.9K
95,604
0
11 Oct 2018
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Arthur Jacot
Franck Gabriel
Clément Hongler
376
3,226
0
20 Jun 2018
Spectrally-normalized margin bounds for neural networks
Spectrally-normalized margin bounds for neural networks
Peter L. Bartlett
Dylan J. Foster
Matus Telgarsky
ODL
350
1,225
0
26 Jun 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
980
133,429
0
12 Jun 2017
Distributed Representations of Sentences and Documents
Distributed Representations of Sentences and Documents
Quoc V. Le
Tomas Mikolov
FaML
395
9,267
0
16 May 2014
On the Number of Linear Regions of Deep Neural Networks
On the Number of Linear Regions of Deep Neural Networks
Guido Montúfar
Razvan Pascanu
Kyunghyun Cho
Yoshua Bengio
106
1,256
0
08 Feb 2014
1