Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.02896
Cited By
v1
v2 (latest)
Representational Strengths and Limitations of Transformers
5 June 2023
Clayton Sanford
Daniel J. Hsu
Matus Telgarsky
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Representational Strengths and Limitations of Transformers"
26 / 26 papers shown
Title
Two Heads Are Better than One: Simulating Large Transformers with Small Ones
Hantao Yu
Josh Alman
51
0
0
13 Jun 2025
Asymptotics of SGD in Sequence-Single Index Models and Single-Layer Attention Networks
Luca Arnaboldi
Bruno Loureiro
Ludovic Stephan
Florent Krzakala
Lenka Zdeborová
69
0
0
03 Jun 2025
Leaner Transformers: More Heads, Less Depth
Hemanth Saratchandran
Damien Teney
Simon Lucey
53
0
0
27 May 2025
Born a Transformer -- Always a Transformer?
Yana Veitsman
Mayank Jobanputra
Yash Sarrof
Aleksandra Bakalova
Vera Demberg
Ellie Pavlick
Michael Hahn
100
0
0
27 May 2025
Lost in Transmission: When and Why LLMs Fail to Reason Globally
Tobias Schnabel
Kiran Tomlinson
Adith Swaminathan
Jennifer Neville
LRM
158
2
0
13 May 2025
Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More
Arvid Frydenlund
LRM
191
0
0
13 Mar 2025
The Role of Sparsity for Length Generalization in Transformers
Noah Golowich
Samy Jelassi
David Brandfonbrener
Sham Kakade
Eran Malach
93
0
0
24 Feb 2025
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity
Yuri Kuratov
M. Arkhipov
Aydar Bulatov
Andrey Kravchenko
163
3
0
18 Feb 2025
Provably Overwhelming Transformer Models with Designed Inputs
Lev Stambler
Seyed Sajjad Nezhadi
Matthew Coudron
168
1
0
09 Feb 2025
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers
Alireza Amiri
Xinting Huang
Mark Rofin
Michael Hahn
LRM
577
3
0
04 Feb 2025
Strassen Attention: Unlocking Compositional Abilities in Transformers Based on a New Lower Bound Method
Alexander Kozachinskiy
Felipe Urrutia
Hector Jimenez
Tomasz Steifer
Germán Pizarro
Matías Fuentes
Francisco Meza
Cristian Buc
Cristóbal Rojas
191
2
0
31 Jan 2025
A completely uniform transformer for parity
Alexander Kozachinskiy
Tomasz Steifer
99
1
0
07 Jan 2025
Approximation Rate of the Transformer Architecture for Sequence Modeling
Hao Jiang
Qianxiao Li
190
11
0
03 Jan 2025
Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning
Md Rifat Arefin
G. Subbaraj
Nicolas Angelard-Gontier
Yann LeCun
Irina Rish
Ravid Shwartz-Ziv
C. Pal
LRM
507
2
0
04 Nov 2024
Fundamental Limitations on Subquadratic Alternatives to Transformers
Josh Alman
Hantao Yu
153
4
0
05 Oct 2024
ENTP: Encoder-only Next Token Prediction
Ethan Ewer
Daewon Chae
Thomas Zeng
Jinkyu Kim
Kangwook Lee
124
4
0
02 Oct 2024
Compositional Hardness of Code in Large Language Models -- A Probabilistic Perspective
Yotam Wolf
Binyamin Rothberg
Dorin Shteyman
Amnon Shashua
117
0
0
26 Sep 2024
When big data actually are low-rank, or entrywise approximation of certain function-generated matrices
Stanislav Budzinskiy
157
2
0
03 Jul 2024
Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
Badri N. Patro
Vijay Srinivas Agneeswaran
Mamba
124
45
0
24 Apr 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
119
21
0
08 Feb 2024
Sample, estimate, aggregate: A recipe for causal discovery foundation models
Menghua Wu
Yujia Bao
Regina Barzilay
Tommi Jaakkola
CML
144
7
0
02 Feb 2024
An Information-Theoretic Analysis of In-Context Learning
Hong Jun Jeon
Jason D. Lee
Qi Lei
Benjamin Van Roy
132
24
0
28 Jan 2024
Convergence of Two-Layer Regression with Nonlinear Units
Yichuan Deng
Zhao Song
Shenghao Xie
82
7
0
16 Aug 2023
H
2
_2
2
O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
Zhenyu Zhang
Ying Sheng
Dinesh Manocha
Tianlong Chen
Lianmin Zheng
...
Yuandong Tian
Christopher Ré
Clark W. Barrett
Zhangyang Wang
Beidi Chen
VLM
241
315
0
24 Jun 2023
Birth of a Transformer: A Memory Viewpoint
A. Bietti
Vivien A. Cabannes
Diane Bouchacourt
Hervé Jégou
Léon Bottou
125
96
0
01 Jun 2023
Streaming Kernel PCA Algorithm With Small Space
Yichuan Deng
Zhao Song
Zifan Wang
Hangke Zhang
116
4
0
08 Mar 2023
1