One Step of Gradient Descent is Provably the Optimal In-Context Learner
with One Layer of Linear Self-Attention

One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention

7 July 2023

Arvind V. Mahankali

Tatsunori B. Hashimoto

ArXiv (abs)PDF HTML

Papers citing "One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention"

15 / 15 papers shown

Title
Minimalist Softmax Attention Provably Learns Constrained Boolean Functions Jerry Yao-Chieh Hu Xiwen Zhang Maojiang Su Zhao Song Han Liu MLT 220 1 0 26 May 2025
Adversarially Pretrained Transformers may be Universally Robust In-Context Learners Soichiro Kumano Hiroshi Kera Toshihiko Yamasaki AAML 117 0 0 20 May 2025
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias Ruiquan Huang Yingbin Liang Jing Yang 109 0 0 02 May 2025
An extension of linear self-attention for in-context learning Katsuyuki Hagiwara 86 0 0 31 Mar 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers Yingyu Liang Zhizhou Sha Zhenmei Shi Zhao Song Yufa Zhou 157 19 0 21 Feb 2025
Vector-ICL: In-context Learning with Continuous Vector Representations Yufan Zhuang Chandan Singh Liyuan Liu Jingbo Shang Jianfeng Gao 128 6 0 21 Feb 2025
Out-of-distribution generalization via composition: a lens through induction heads in Transformers Jiajun Song Zhuoyan Xu Yiqiao Zhong 145 10 0 31 Dec 2024
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery Renpu Liu Ruida Zhou Cong Shen Jing Yang 110 0 0 17 Oct 2024
In-context learning and Occam's razor Eric Elmoznino Tom Marty Tejas Kasetty Léo Gagnon Sarthak Mittal Mahan Fathi Dhanya Sridhar Guillaume Lajoie 119 1 0 17 Oct 2024
On the Training Convergence of Transformers for In-Context Classification of Gaussian Mixtures Wei Shen Ruida Zhou Jing Yang Cong Shen 64 4 0 15 Oct 2024
Transformers are Provably Optimal In-context Estimators for Wireless Communications Vishnu Teja Kunde Vicram Rajagopalan Chandra Shekhara Kaushik Valmeekam Krishna R. Narayanan S. Shakkottai D. Kalathil J. Chamberland 103 6 0 01 Nov 2023
Transformers learn in-context by gradient descent J. Oswald Eyvind Niklasson E. Randazzo João Sacramento A. Mordvintsev A. Zhmoginov Max Vladymyrov MLT 116 494 0 15 Dec 2022
Transformers Learn Shortcuts to Automata Bingbin Liu Jordan T. Ash Surbhi Goel A. Krishnamurthy Cyril Zhang OffRL LRM 135 176 0 19 Oct 2022
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes Shivam Garg Dimitris Tsipras Percy Liang Gregory Valiant 141 513 0 01 Aug 2022
An Explanation of In-context Learning as Implicit Bayesian Inference Sang Michael Xie Aditi Raghunathan Percy Liang Tengyu Ma ReLM BDL VPVLM LRM 208 763 0 03 Nov 2021

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.