Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.03576
Cited By
One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention
7 July 2023
Arvind V. Mahankali
Tatsunori B. Hashimoto
Tengyu Ma
MLT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention"
15 / 15 papers shown
Title
Minimalist Softmax Attention Provably Learns Constrained Boolean Functions
Jerry Yao-Chieh Hu
Xiwen Zhang
Maojiang Su
Zhao Song
Han Liu
MLT
220
1
0
26 May 2025
Adversarially Pretrained Transformers may be Universally Robust In-Context Learners
Soichiro Kumano
Hiroshi Kera
Toshihiko Yamasaki
AAML
117
0
0
20 May 2025
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
Ruiquan Huang
Yingbin Liang
Jing Yang
109
0
0
02 May 2025
An extension of linear self-attention for in-context learning
Katsuyuki Hagiwara
86
0
0
31 Mar 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
Yufa Zhou
157
19
0
21 Feb 2025
Vector-ICL: In-context Learning with Continuous Vector Representations
Yufan Zhuang
Chandan Singh
Liyuan Liu
Jingbo Shang
Jianfeng Gao
128
6
0
21 Feb 2025
Out-of-distribution generalization via composition: a lens through induction heads in Transformers
Jiajun Song
Zhuoyan Xu
Yiqiao Zhong
145
10
0
31 Dec 2024
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery
Renpu Liu
Ruida Zhou
Cong Shen
Jing Yang
110
0
0
17 Oct 2024
In-context learning and Occam's razor
Eric Elmoznino
Tom Marty
Tejas Kasetty
Léo Gagnon
Sarthak Mittal
Mahan Fathi
Dhanya Sridhar
Guillaume Lajoie
119
1
0
17 Oct 2024
On the Training Convergence of Transformers for In-Context Classification of Gaussian Mixtures
Wei Shen
Ruida Zhou
Jing Yang
Cong Shen
64
4
0
15 Oct 2024
Transformers are Provably Optimal In-context Estimators for Wireless Communications
Vishnu Teja Kunde
Vicram Rajagopalan
Chandra Shekhara Kaushik Valmeekam
Krishna R. Narayanan
S. Shakkottai
D. Kalathil
J. Chamberland
103
6
0
01 Nov 2023
Transformers learn in-context by gradient descent
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
MLT
116
494
0
15 Dec 2022
Transformers Learn Shortcuts to Automata
Bingbin Liu
Jordan T. Ash
Surbhi Goel
A. Krishnamurthy
Cyril Zhang
OffRL
LRM
135
176
0
19 Oct 2022
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
Shivam Garg
Dimitris Tsipras
Percy Liang
Gregory Valiant
141
513
0
01 Aug 2022
An Explanation of In-context Learning as Implicit Bayesian Inference
Sang Michael Xie
Aditi Raghunathan
Percy Liang
Tengyu Ma
ReLM
BDL
VPVLM
LRM
208
763
0
03 Nov 2021
1