ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.03576
  4. Cited By
One Step of Gradient Descent is Provably the Optimal In-Context Learner
  with One Layer of Linear Self-Attention

One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention

7 July 2023
Arvind V. Mahankali
Tatsunori B. Hashimoto
Tengyu Ma
    MLT
ArXiv (abs)PDFHTML

Papers citing "One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention"

15 / 15 papers shown
Title
Minimalist Softmax Attention Provably Learns Constrained Boolean Functions
Minimalist Softmax Attention Provably Learns Constrained Boolean Functions
Jerry Yao-Chieh Hu
Xiwen Zhang
Maojiang Su
Zhao Song
Han Liu
MLT
220
1
0
26 May 2025
Adversarially Pretrained Transformers may be Universally Robust In-Context Learners
Adversarially Pretrained Transformers may be Universally Robust In-Context Learners
Soichiro Kumano
Hiroshi Kera
Toshihiko Yamasaki
AAML
117
0
0
20 May 2025
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
Ruiquan Huang
Yingbin Liang
Jing Yang
109
0
0
02 May 2025
An extension of linear self-attention for in-context learning
An extension of linear self-attention for in-context learning
Katsuyuki Hagiwara
86
0
0
31 Mar 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
Yufa Zhou
157
19
0
21 Feb 2025
Vector-ICL: In-context Learning with Continuous Vector Representations
Vector-ICL: In-context Learning with Continuous Vector Representations
Yufan Zhuang
Chandan Singh
Liyuan Liu
Jingbo Shang
Jianfeng Gao
128
6
0
21 Feb 2025
Out-of-distribution generalization via composition: a lens through induction heads in Transformers
Out-of-distribution generalization via composition: a lens through induction heads in Transformers
Jiajun Song
Zhuoyan Xu
Yiqiao Zhong
145
10
0
31 Dec 2024
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery
Renpu Liu
Ruida Zhou
Cong Shen
Jing Yang
110
0
0
17 Oct 2024
In-context learning and Occam's razor
In-context learning and Occam's razor
Eric Elmoznino
Tom Marty
Tejas Kasetty
Léo Gagnon
Sarthak Mittal
Mahan Fathi
Dhanya Sridhar
Guillaume Lajoie
119
1
0
17 Oct 2024
On the Training Convergence of Transformers for In-Context Classification of Gaussian Mixtures
On the Training Convergence of Transformers for In-Context Classification of Gaussian Mixtures
Wei Shen
Ruida Zhou
Jing Yang
Cong Shen
64
4
0
15 Oct 2024
Transformers are Provably Optimal In-context Estimators for Wireless Communications
Transformers are Provably Optimal In-context Estimators for Wireless Communications
Vishnu Teja Kunde
Vicram Rajagopalan
Chandra Shekhara Kaushik Valmeekam
Krishna R. Narayanan
S. Shakkottai
D. Kalathil
J. Chamberland
103
6
0
01 Nov 2023
Transformers learn in-context by gradient descent
Transformers learn in-context by gradient descent
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
MLT
116
494
0
15 Dec 2022
Transformers Learn Shortcuts to Automata
Transformers Learn Shortcuts to Automata
Bingbin Liu
Jordan T. Ash
Surbhi Goel
A. Krishnamurthy
Cyril Zhang
OffRLLRM
135
176
0
19 Oct 2022
What Can Transformers Learn In-Context? A Case Study of Simple Function
  Classes
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
Shivam Garg
Dimitris Tsipras
Percy Liang
Gregory Valiant
141
513
0
01 Aug 2022
An Explanation of In-context Learning as Implicit Bayesian Inference
An Explanation of In-context Learning as Implicit Bayesian Inference
Sang Michael Xie
Aditi Raghunathan
Percy Liang
Tengyu Ma
ReLMBDLVPVLMLRM
208
763
0
03 Nov 2021
1