Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.04271
Cited By
v1
v2 (latest)
Fundamental Limitations on Subquadratic Alternatives to Transformers
5 October 2024
Josh Alman
Hantao Yu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Fundamental Limitations on Subquadratic Alternatives to Transformers"
42 / 42 papers shown
Title
Two Heads Are Better than One: Simulating Large Transformers with Small Ones
Hantao Yu
Josh Alman
35
0
0
13 Jun 2025
SLICK: Selective Localization and Instance Calibration for Knowledge-Enhanced Car Damage Segmentation in Automotive Insurance
Teerapong Panboonyuen
153
0
0
12 Jun 2025
Subquadratic Algorithms and Hardness for Attention with Any Temperature
Shreya Gupta
Boyang Huang
Barna Saha
Yinzhan Xu
Christopher Ye
66
2
0
20 May 2025
Compression Barriers for Autoregressive Transformers
Themistoklis Haris
Krzysztof Onak
79
1
0
21 Feb 2025
Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models
Jerry Yao-Chieh Hu
Maojiang Su
En-Jui Kuo
Zhao Song
Han Liu
89
29
0
05 Jun 2024
Transformers, parallel computation, and logarithmic depth
Clayton Sanford
Daniel J. Hsu
Matus Telgarsky
75
42
0
14 Feb 2024
On Computational Limits of Modern Hopfield Models: A Fine-Grained Complexity Analysis
Jerry Yao-Chieh Hu
Thomas Lin
Zhao Song
Han Liu
98
41
0
07 Feb 2024
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu
Tri Dao
Mamba
168
2,839
0
01 Dec 2023
What Formal Languages Can Transformers Express? A Survey
Lena Strobl
William Merrill
Gail Weiss
David Chiang
Dana Angluin
AI4CE
113
60
0
01 Nov 2023
The Expressive Power of Transformers with Chain of Thought
William Merrill
Ashish Sabharwal
LRM
AI4CE
ReLM
105
0
0
11 Oct 2023
HyperAttention: Long-context Attention in Near-Linear Time
Insu Han
Rajesh Jayaram
Amin Karbasi
Vahab Mirrokni
David P. Woodruff
A. Zandieh
116
74
0
09 Oct 2023
Representational Strengths and Limitations of Transformers
Clayton Sanford
Daniel J. Hsu
Matus Telgarsky
76
93
0
05 Jun 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.6K
14,853
0
15 Mar 2023
Fast Attention Requires Bounded Entries
Josh Alman
Zhao Song
106
86
0
26 Feb 2023
KDEformer: Accelerating Transformers via Kernel Density Estimation
A. Zandieh
Insu Han
Majid Daliri
Amin Karbasi
125
47
0
05 Feb 2023
Transformers Learn Shortcuts to Automata
Bingbin Liu
Jordan T. Ash
Surbhi Goel
A. Krishnamurthy
Cyril Zhang
OffRL
LRM
161
178
0
19 Oct 2022
On The Computational Complexity of Self-Attention
Feyza Duman Keles
Pruthuvi Maheshakya Wijewardena
Chinmay Hegde
144
130
0
11 Sep 2022
Formal Algorithms for Transformers
Mary Phuong
Marcus Hutter
62
75
0
19 Jul 2022
Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity
Sophie Hao
Dana Angluin
Robert Frank
85
80
0
13 Apr 2022
Overcoming a Theoretical Limitation of Self-Attention
David Chiang
Peter A. Cholak
110
84
0
24 Feb 2022
Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers
Colin Wei
Yining Chen
Tengyu Ma
79
92
0
28 Jul 2021
Saturated Transformers are Constant-Depth Threshold Circuits
William Merrill
Ashish Sabharwal
Noah A. Smith
118
107
0
30 Jun 2021
Self-Attention Networks Can Process Bounded Hierarchical Languages
Shunyu Yao
Binghui Peng
Christos H. Papadimitriou
Karthik Narasimhan
89
83
0
24 May 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
772
41,877
0
22 Oct 2020
SMYRF: Efficient Attention using Asymmetric Clustering
Giannis Daras
Nikita Kitaev
Augustus Odena
A. Dimakis
106
46
0
11 Oct 2020
Rethinking Attention with Performers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
198
1,605
0
30 Sep 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
218
1,800
0
29 Jun 2020
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
247
1,720
0
08 Jun 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
1.2K
42,712
0
28 May 2020
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT
3DV
PINN
509
13,230
0
26 May 2020
Synthesizer: Rethinking Self-Attention in Transformer Models
Yi Tay
Dara Bahri
Donald Metzler
Da-Cheng Juan
Zhe Zhao
Che Zheng
76
342
0
02 May 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
232
4,109
0
10 Apr 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
405
606
0
12 Mar 2020
Statistical Analysis of Nearest Neighbor Methods for Anomaly Detection
Xiaoyi Gu
Leman Akoglu
Alessandro Rinaldo
149
96
0
08 Jul 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Zhilin Yang
Zihang Dai
Yiming Yang
J. Carbonell
Ruslan Salakhutdinov
Quoc V. Le
AI4CE
378
8,464
0
19 Jun 2019
Theoretical Limitations of Self-Attention in Neural Sequence Models
Michael Hahn
94
276
0
16 Jun 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.9K
95,604
0
11 Oct 2018
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Arthur Jacot
Franck Gabriel
Clément Hongler
376
3,226
0
20 Jun 2018
Spectrally-normalized margin bounds for neural networks
Peter L. Bartlett
Dylan J. Foster
Matus Telgarsky
ODL
359
1,225
0
26 Jun 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
980
133,443
0
12 Jun 2017
Distributed Representations of Sentences and Documents
Quoc V. Le
Tomas Mikolov
FaML
395
9,267
0
16 May 2014
On the Number of Linear Regions of Deep Neural Networks
Guido Montúfar
Razvan Pascanu
Kyunghyun Cho
Yoshua Bengio
106
1,256
0
08 Feb 2014
1