One Step of Gradient Descent is Provably the Optimal In-Context Learner
with One Layer of Linear Self-Attention

One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention

7 July 2023

Arvind V. Mahankali

Tatsunori B. Hashimoto

ArXiv (abs)PDF HTML

Papers citing "One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention"

15 / 15 papers shown

Title
Minimalist Softmax Attention Provably Learns Constrained Boolean Functions Jerry Yao-Chieh Hu Xiwen Zhang Maojiang Su Zhao Song Han Liu MLT 220 1 0 26 May 2025
Adversarially Pretrained Transformers may be Universally Robust In-Context Learners Soichiro Kumano Hiroshi Kera Toshihiko Yamasaki AAML 117 0 0 20 May 2025
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias Ruiquan Huang Yingbin Liang Jing Yang 109 0 0 02 May 2025
An extension of linear self-attention for in-context learning Katsuyuki Hagiwara 86 0 0 31 Mar 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers Yingyu Liang Zhizhou Sha Zhenmei Shi Zhao Song Yufa Zhou 157 19 0 21 Feb 2025
Vector-ICL: In-context Learning with Continuous Vector Representations Yufan Zhuang Chandan Singh Liyuan Liu Jingbo Shang Jianfeng Gao 128 6 0 21 Feb 2025
Out-of-distribution generalization via composition: a lens through induction heads in Transformers Jiajun Song Zhuoyan Xu Yiqiao Zhong 145 10 0 31 Dec 2024
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery Renpu Liu Ruida Zhou Cong Shen Jing Yang 110 0 0 17 Oct 2024
In-context learning and Occam's razor Eric Elmoznino Tom Marty Tejas Kasetty Léo Gagnon Sarthak Mittal Mahan Fathi Dhanya Sridhar Guillaume Lajoie 119 1 0 17 Oct 2024
On the Training Convergence of Transformers for In-Context Classification of Gaussian Mixtures Wei Shen Ruida Zhou Jing Yang Cong Shen 64 4 0 15 Oct 2024
Transformers are Provably Optimal In-context Estimators for Wireless Communications Vishnu Teja Kunde Vicram Rajagopalan Chandra Shekhara Kaushik Valmeekam Krishna R. Narayanan S. Shakkottai D. Kalathil J. Chamberland 103 6 0 01 Nov 2023
Transformers learn in-context by gradient descent J. Oswald Eyvind Niklasson E. Randazzo João Sacramento A. Mordvintsev A. Zhmoginov Max Vladymyrov MLT 116 494 0 15 Dec 2022
Transformers Learn Shortcuts to Automata Bingbin Liu Jordan T. Ash Surbhi Goel A. Krishnamurthy Cyril Zhang OffRL LRM 135 176 0 19 Oct 2022
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes Shivam Garg Dimitris Tsipras Percy Liang Gregory Valiant 141 513 0 01 Aug 2022
An Explanation of In-context Learning as Implicit Bayesian Inference Sang Michael Xie Aditi Raghunathan Percy Liang Tengyu Ma ReLM BDL VPVLM LRM 208 763 0 03 Nov 2021