Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.06893
Cited By
Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot
11 June 2024
Zixuan Wang
Stanley Wei
Daniel Hsu
Jason D. Lee
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot"
10 / 10 papers shown
Title
Asymptotics of SGD in Sequence-Single Index Models and Single-Layer Attention Networks
Luca Arnaboldi
Bruno Loureiro
Ludovic Stephan
Florent Krzakala
Lenka Zdeborová
65
0
0
03 Jun 2025
Transformers Can Overcome the Curse of Dimensionality: A Theoretical Study from an Approximation Perspective
Yuling Jiao
Yanming Lai
Yang Wang
Bokai Yan
62
0
0
18 Apr 2025
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning
Yingcong Li
Davoud Ataee Tarzanagh
A. S. Rawat
Maryam Fazel
Samet Oymak
63
1
0
06 Apr 2025
When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective
Alireza Mousavi-Hosseini
Clayton Sanford
Denny Wu
Murat A. Erdogdu
105
1
0
14 Mar 2025
On the Robustness of Transformers against Context Hijacking for Linear Classification
Tianle Li
Chenyang Zhang
Xingwu Chen
Yuan Cao
Difan Zou
126
2
0
24 Feb 2025
The Role of Sparsity for Length Generalization in Transformers
Noah Golowich
Samy Jelassi
David Brandfonbrener
Sham Kakade
Eran Malach
83
0
0
24 Feb 2025
Training Dynamics of In-Context Learning in Linear Attention
Yedi Zhang
Aaditya K. Singh
Peter E. Latham
Andrew Saxe
MLT
130
5
0
27 Jan 2025
On the Training Convergence of Transformers for In-Context Classification of Gaussian Mixtures
Wei Shen
Ruida Zhou
Jing Yang
Cong Shen
81
4
0
15 Oct 2024
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
Kaiyue Wen
Huaqing Zhang
Hongzhou Lin
Jingzhao Zhang
MoE
LRM
175
7
0
07 Oct 2024
Attention layers provably solve single-location regression
Pierre Marion
Raphael Berthier
Gérard Biau
Claire Boyer
474
7
0
02 Oct 2024
1