ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.00297
  4. Cited By
Transformers learn to implement preconditioned gradient descent for
  in-context learning
v1v2 (latest)

Transformers learn to implement preconditioned gradient descent for in-context learning

Neural Information Processing Systems (NeurIPS), 2023
1 June 2023
Kwangjun Ahn
Xiang Cheng
Hadi Daneshmand
S. Sra
    ODL
ArXiv (abs)PDFHTML

Papers citing "Transformers learn to implement preconditioned gradient descent for in-context learning"

50 / 92 papers shown
Title
Genomic Next-Token Predictors are In-Context Learners
Genomic Next-Token Predictors are In-Context Learners
Nathan Breslow
Aayush Mishra
Mahler Revsine
Michael C. Schatz
Anqi Liu
Daniel Khashabi
95
0
0
16 Nov 2025
Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding
Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding
Qian Ma
Ruoxiang Xu
Yongqiang Cai
40
0
0
09 Nov 2025
Optimal Attention Temperature Enhances In-Context Learning under Distribution Shift
Optimal Attention Temperature Enhances In-Context Learning under Distribution Shift
Samet Demir
Zafer Dogan
76
0
0
03 Nov 2025
Provable test-time adaptivity and distributional robustness of in-context learning
Provable test-time adaptivity and distributional robustness of in-context learning
Tianyi Ma
Tengyao Wang
R. Samworth
84
1
0
27 Oct 2025
A Framework for Quantifying How Pre-Training and Context Benefit In-Context Learning
A Framework for Quantifying How Pre-Training and Context Benefit In-Context Learning
Bingqing Song
Jiaxiang Li
Rong Wang
Songtao Lu
Mingyi Hong
60
0
0
26 Oct 2025
Transformers are almost optimal metalearners for linear classification
Transformers are almost optimal metalearners for linear classification
Roey Magen
Gal Vardi
88
0
0
22 Oct 2025
Optimality and NP-Hardness of Transformers in Learning Markovian Dynamical Functions
Optimality and NP-Hardness of Transformers in Learning Markovian Dynamical Functions
Yanna Ding
Songtao Lu
Yingdong Lu
T. Nowicki
Jianxi Gao
98
0
0
21 Oct 2025
Improving Model Representation and Reducing KV Cache via Skip Connections with First Value Heads
Improving Model Representation and Reducing KV Cache via Skip Connections with First Value Heads
Zhoutong Wu
Y. Zhang
Yiming Dong
Chenheng Zhang
Cong Fang
Kun Yuan
Zhouchen Lin
83
0
0
19 Oct 2025
Mamba Can Learn Low-Dimensional Targets In-Context via Test-Time Feature Learning
Mamba Can Learn Low-Dimensional Targets In-Context via Test-Time Feature Learning
Junsoo Oh
Wei Huang
Taiji Suzuki
140
0
0
14 Oct 2025
Softmax $\geq$ Linear: Transformers may learn to classify in-context by kernel gradient descent
Softmax ≥\geq≥ Linear: Transformers may learn to classify in-context by kernel gradient descent
Sara Dragutinovic
Andrew Saxe
Aaditya K. Singh
MLT
92
0
0
12 Oct 2025
Fine-Grained Emotion Recognition via In-Context Learning
Fine-Grained Emotion Recognition via In-Context Learning
Zhaochun Ren
Zhou Yang
Chenglong Ye
Haizhou Sun
Chao Chen
Xiaofei Zhu
Xiangwen Liao
76
0
0
08 Oct 2025
Learning Linear Regression with Low-Rank Tasks in-Context
Learning Linear Regression with Low-Rank Tasks in-Context
Kaito Takanami
Takashi Takahashi
Y. Kabashima
51
0
0
06 Oct 2025
Continual Learning with Query-Only Attention
Continual Learning with Query-Only Attention
Gautham Udayakumar Bekal
Ashish Pujari
Scott David Kelly
CLL
208
0
0
01 Oct 2025
Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression
Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression
Yifei Zuo
Yutong Yin
Zhichen Zeng
Ang Li
Banghua Zhu
Zhaoran Wang
96
0
0
01 Oct 2025
Theory of Scaling Laws for In-Context Regression: Depth, Width, Context and Time
Theory of Scaling Laws for In-Context Regression: Depth, Width, Context and Time
Blake Bordelon
Mary I. Letey
Cengiz Pehlevan
116
0
0
01 Oct 2025
Pretrain-Test Task Alignment Governs Generalization in In-Context Learning
Pretrain-Test Task Alignment Governs Generalization in In-Context Learning
Mary I. Letey
Jacob A. Zavatone-Veth
Yue M. Lu
Cengiz Pehlevan
89
1
0
30 Sep 2025
Statistical Advantage of Softmax Attention: Insights from Single-Location Regression
Statistical Advantage of Softmax Attention: Insights from Single-Location Regression
O. Duranthon
P. Marion
C. Boyer
B. Loureiro
L. Zdeborová
88
0
0
26 Sep 2025
Linear Transformers Implicitly Discover Unified Numerical Algorithms
Linear Transformers Implicitly Discover Unified Numerical Algorithms
Patrick Lutz
Aditya Gangrade
Hadi Daneshmand
Venkatesh Saligrama
40
0
0
24 Sep 2025
Towards Provable Emergence of In-Context Reinforcement Learning
Towards Provable Emergence of In-Context Reinforcement Learning
Jiuqi Wang
Rohan Chandra
Shangtong Zhang
OffRL
129
1
0
22 Sep 2025
Selective Induction Heads: How Transformers Select Causal Structures In Context
Selective Induction Heads: How Transformers Select Causal Structures In ContextInternational Conference on Learning Representations (ICLR), 2025
Francesco DÁngelo
Francesco Croce
Nicolas Flammarion
72
4
0
09 Sep 2025
Towards Theoretical Understanding of Transformer Test-Time Computing: Investigation on In-Context Linear Regression
Towards Theoretical Understanding of Transformer Test-Time Computing: Investigation on In-Context Linear Regression
Xingwu Chen
Miao Lu
Beining Wu
Difan Zou
109
0
0
11 Aug 2025
Provable Low-Frequency Bias of In-Context Learning of Representations
Provable Low-Frequency Bias of In-Context Learning of Representations
Yongyi Yang
Hidenori Tanaka
Wei Hu
162
0
0
17 Jul 2025
When and How Unlabeled Data Provably Improve In-Context Learning
When and How Unlabeled Data Provably Improve In-Context Learning
Yingcong Li
Xiangyu Chang
Muti Kara
Xiaofeng Liu
Amit K. Roy-Chowdhury
Samet Oymak
165
1
0
18 Jun 2025
Understanding Task Vectors in In-Context Learning: Emergence, Functionality, and Limitations
Yuxin Dong
Jiachen Jiang
Zhihui Zhu
Xia Ning
130
3
0
10 Jun 2025
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
D. Kunin
Giovanni Luca Marchetti
F. Chen
Dhruva Karkada
James B. Simon
M. DeWeese
Surya Ganguli
Nina Miolane
261
3
0
06 Jun 2025
When can in-context learning generalize out of task distribution?
When can in-context learning generalize out of task distribution?
Chase Goddard
Lindsay M. Smith
Vudtiwat Ngampruetikorn
David J. Schwab
OOD
102
3
0
05 Jun 2025
Optimization-Inspired Few-Shot Adaptation for Large Language Models
Optimization-Inspired Few-Shot Adaptation for Large Language Models
Boyan Gao
Xin Wang
Jianlong Wu
David A. Clifton
204
0
0
25 May 2025
Continuum Transformers Perform In-Context Learning by Operator Gradient Descent
Continuum Transformers Perform In-Context Learning by Operator Gradient Descent
Abhiti Mishra
Yash Patel
Ambuj Tewari
221
0
0
23 May 2025
From Compression to Expression: A Layerwise Analysis of In-Context Learning
From Compression to Expression: A Layerwise Analysis of In-Context Learning
Jiachen Jiang
Yuxin Dong
Jinxin Zhou
Zhihui Zhu
129
3
0
22 May 2025
Adversarially Pretrained Transformers may be Universally Robust In-Context Learners
Adversarially Pretrained Transformers may be Universally Robust In-Context Learners
Soichiro Kumano
Hiroshi Kera
Toshihiko Yamasaki
AAML
310
1
0
20 May 2025
Attention-based clustering
Attention-based clustering
Rodrigo Maulen-Soto
Claire Boyer
Pierre Marion
256
0
0
19 May 2025
Rethinking Invariance in In-context Learning
Rethinking Invariance in In-context LearningInternational Conference on Learning Representations (ICLR), 2025
Lizhe Fang
Yifei Wang
Khashayar Gatmiry
Lei Fang
Yun Wang
263
8
0
08 May 2025
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
Ruiquan Huang
Yingbin Liang
Jing Yang
468
4
0
02 May 2025
Exact Learning Dynamics of In-Context Learning in Linear Transformers and Its Application to Non-Linear Transformers
Exact Learning Dynamics of In-Context Learning in Linear Transformers and Its Application to Non-Linear Transformers
Nischal Mainali
Lucas Teixeira
217
2
0
17 Apr 2025
Can Pre-training Indicators Reliably Predict Fine-tuning Outcomes of LLMs?
Can Pre-training Indicators Reliably Predict Fine-tuning Outcomes of LLMs?
Hansi Zeng
Kai Hui
Honglei Zhuang
Zhen Qin
Zhenrui Yue
Hamed Zamani
Dana Alon
137
0
0
16 Apr 2025
Contextualize-then-Aggregate: Circuits for In-Context Learning in Gemma-2 2B
Contextualize-then-Aggregate: Circuits for In-Context Learning in Gemma-2 2B
Aleksandra Bakalova
Yana Veitsman
Xinting Huang
Michael Hahn
242
6
0
31 Mar 2025
Decision Feedback In-Context Learning for Wireless Symbol Detection
Decision Feedback In-Context Learning for Wireless Symbol Detection
Li Fan
Jing Yang
Jing Yang
Cong Shen
338
0
0
20 Mar 2025
Taming Knowledge Conflicts in Language Models
Taming Knowledge Conflicts in Language Models
Gaotang Li
Yuzhong Chen
Hanghang Tong
KELM
314
8
0
14 Mar 2025
When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective
Alireza Mousavi-Hosseini
Clayton Sanford
Denny Wu
Murat A. Erdogdu
243
3
0
14 Mar 2025
Provable Benefits of Task-Specific Prompts for In-context LearningInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2025
Xiangyu Chang
Yingcong Li
Muti Kara
Samet Oymak
Amit K. Roy-Chowdhury
302
1
0
03 Mar 2025
In-Context Learning with Hypothesis-Class Guidance
In-Context Learning with Hypothesis-Class Guidance
Ziqian Lin
Shubham Kumar Bharti
Kangwook Lee
365
0
0
27 Feb 2025
Towards Auto-Regressive Next-Token Prediction: In-Context Learning Emerges from Generalization
Towards Auto-Regressive Next-Token Prediction: In-Context Learning Emerges from GeneralizationInternational Conference on Learning Representations (ICLR), 2025
Zixuan Gong
Xiaolin Hu
Huayi Tang
Yong Liu
284
2
0
24 Feb 2025
On the Robustness of Transformers against Context Hijacking for Linear Classification
On the Robustness of Transformers against Context Hijacking for Linear Classification
Tianle Li
Chenyang Zhang
Xingwu Chen
Yuan Cao
Difan Zou
305
3
0
24 Feb 2025
Vector-ICL: In-context Learning with Continuous Vector Representations
Vector-ICL: In-context Learning with Continuous Vector RepresentationsInternational Conference on Learning Representations (ICLR), 2024
Yufan Zhuang
Chandan Singh
Liyuan Liu
Jingbo Shang
Jianfeng Gao
344
10
0
21 Feb 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Looped ReLU MLPs May Be All You Need as Practical Programmable ComputersInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
Yufa Zhou
520
22
0
21 Feb 2025
Transformers versus the EM Algorithm in Multi-class Clustering
Yihan He
Hong-Yu Chen
Yuan Cao
Jianqing Fan
Han Liu
229
2
0
09 Feb 2025
Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?
Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?International Conference on Learning Representations (ICLR), 2025
Yutong Yin
Zhaoran Wang
LRMReLM
1.0K
2
0
27 Jan 2025
Training Dynamics of In-Context Learning in Linear Attention
Training Dynamics of In-Context Learning in Linear Attention
Yedi Zhang
Aaditya K. Singh
Peter E. Latham
Andrew Saxe
MLT
247
19
0
27 Jan 2025
Rethinking Associative Memory Mechanism in Induction Head
Rethinking Associative Memory Mechanism in Induction Head
Shuo Wang
Issei Sato
341
0
0
16 Dec 2024
Re-examining learning linear functions in contextDeutsche Jahrestagung für Künstliche Intelligenz (KI), 2024
Omar Naim
Guilhem Fouilhé
Nicholas Asher
351
4
0
18 Nov 2024
12
Next