ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.00297
  4. Cited By
Transformers learn to implement preconditioned gradient descent for
  in-context learning

Transformers learn to implement preconditioned gradient descent for in-context learning

1 June 2023
Kwangjun Ahn
Xiang Cheng
Hadi Daneshmand
S. Sra
    ODL
ArXivPDFHTML

Papers citing "Transformers learn to implement preconditioned gradient descent for in-context learning"

50 / 127 papers shown
Title
Optimization-Inspired Few-Shot Adaptation for Large Language Models
Optimization-Inspired Few-Shot Adaptation for Large Language Models
Boyan Gao
Xin Wang
Yibo Yang
David A. Clifton
5
0
0
25 May 2025
Continuum Transformers Perform In-Context Learning by Operator Gradient Descent
Continuum Transformers Perform In-Context Learning by Operator Gradient Descent
Abhiti Mishra
Yash Patel
Ambuj Tewari
10
0
0
23 May 2025
From Compression to Expansion: A Layerwise Analysis of In-Context Learning
From Compression to Expansion: A Layerwise Analysis of In-Context Learning
Jiachen Jiang
Yuxin Dong
Jinxin Zhou
Zhihui Zhu
5
0
0
22 May 2025
Out-of-Distribution Generalization of In-Context Learning: A Low-Dimensional Subspace Perspective
Out-of-Distribution Generalization of In-Context Learning: A Low-Dimensional Subspace Perspective
Soo Min Kwon
Alec S. Xu
Can Yaras
Laura Balzano
Qing Qu
OOD
17
0
0
20 May 2025
Adversarially Pretrained Transformers may be Universally Robust In-Context Learners
Adversarially Pretrained Transformers may be Universally Robust In-Context Learners
Soichiro Kumano
Hiroshi Kera
Toshihiko Yamasaki
AAML
30
0
0
20 May 2025
Attention-based clustering
Attention-based clustering
Rodrigo Maulen-Soto
Claire Boyer
Pierre Marion
24
0
0
19 May 2025
Rethinking Invariance in In-context Learning
Rethinking Invariance in In-context Learning
Lizhe Fang
Yifei Wang
Khashayar Gatmiry
Lei Fang
Yun Wang
61
3
0
08 May 2025
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
Ruiquan Huang
Yingbin Liang
Jing Yang
72
0
0
02 May 2025
Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations
Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations
Yiyou Sun
Y. Gai
Lijie Chen
Abhilasha Ravichander
Yejin Choi
D. Song
HILM
62
1
0
17 Apr 2025
Exact Learning Dynamics of In-Context Learning in Linear Transformers and Its Application to Non-Linear Transformers
Exact Learning Dynamics of In-Context Learning in Linear Transformers and Its Application to Non-Linear Transformers
Nischal Mainali
Lucas Teixeira
36
0
0
17 Apr 2025
Can Pre-training Indicators Reliably Predict Fine-tuning Outcomes of LLMs?
Can Pre-training Indicators Reliably Predict Fine-tuning Outcomes of LLMs?
Hansi Zeng
Kai Hui
Honglei Zhuang
Zhen Qin
Zhenrui Yue
Hamed Zamani
Dana Alon
45
0
0
16 Apr 2025
Reasoning without Regret
Reasoning without Regret
Tarun Chitra
OffRL
LRM
47
0
0
14 Apr 2025
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning
Yingcong Li
Davoud Ataee Tarzanagh
A. S. Rawat
Maryam Fazel
Samet Oymak
32
0
0
06 Apr 2025
Contextualize-then-Aggregate: Circuits for In-Context Learning in Gemma-2 2B
Contextualize-then-Aggregate: Circuits for In-Context Learning in Gemma-2 2B
Aleksandra Bakalova
Yana Veitsman
Xinting Huang
Michael Hahn
46
0
0
31 Mar 2025
Experience Replay Addresses Loss of Plasticity in Continual Learning
Experience Replay Addresses Loss of Plasticity in Continual Learning
Jiuqi Wang
Rohan Chandra
Shangtong Zhang
CLL
KELM
77
0
0
25 Mar 2025
Transformer-based Wireless Symbol Detection Over Fading Channels
Transformer-based Wireless Symbol Detection Over Fading Channels
Li Fan
Jing Yang
Cong Shen
61
0
0
20 Mar 2025
Test-Time Training Provably Improves Transformers as In-context Learners
Test-Time Training Provably Improves Transformers as In-context Learners
Halil Alperen Gozeten
M. E. Ildiz
Xuechen Zhang
Mahdi Soltanolkotabi
Marco Mondelli
Samet Oymak
71
1
0
14 Mar 2025
When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective
Alireza Mousavi-Hosseini
Clayton Sanford
Denny Wu
Murat A. Erdogdu
63
0
0
14 Mar 2025
Taming Knowledge Conflicts in Language Models
Gaotang Li
Yuzhong Chen
Hanghang Tong
KELM
62
1
0
14 Mar 2025
Provable Benefits of Task-Specific Prompts for In-context Learning
Xiangyu Chang
Yingcong Li
Muti Kara
Samet Oymak
Amit K. Roy-Chowdhury
74
0
0
03 Mar 2025
In-Context Learning with Hypothesis-Class Guidance
In-Context Learning with Hypothesis-Class Guidance
Ziqian Lin
Shubham Kumar Bharti
Kangwook Lee
104
0
0
27 Feb 2025
On the Robustness of Transformers against Context Hijacking for Linear Classification
On the Robustness of Transformers against Context Hijacking for Linear Classification
Tianle Li
Chenyang Zhang
Xingwu Chen
Yuan Cao
Difan Zou
83
1
0
24 Feb 2025
Towards Auto-Regressive Next-Token Prediction: In-Context Learning Emerges from Generalization
Towards Auto-Regressive Next-Token Prediction: In-Context Learning Emerges from Generalization
Zixuan Gong
Xiaolin Hu
Huayi Tang
Yong Liu
67
0
0
24 Feb 2025
Ask, and it shall be given: On the Turing completeness of prompting
Ask, and it shall be given: On the Turing completeness of prompting
Ruizhong Qiu
Zhe Xu
Wenxuan Bao
Hanghang Tong
ReLM
LRM
AI4CE
100
0
0
24 Feb 2025
CoT-ICL Lab: A Synthetic Framework for Studying Chain-of-Thought Learning from In-Context Demonstrations
CoT-ICL Lab: A Synthetic Framework for Studying Chain-of-Thought Learning from In-Context Demonstrations
Vignesh Kothapalli
Hamed Firooz
Maziar Sanjabi
75
0
0
21 Feb 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
Yufa Zhou
101
18
0
21 Feb 2025
Vector-ICL: In-context Learning with Continuous Vector Representations
Vector-ICL: In-context Learning with Continuous Vector Representations
Yufan Zhuang
Chandan Singh
Liyuan Liu
Jingbo Shang
Jianfeng Gao
81
5
0
21 Feb 2025
Transformers versus the EM Algorithm in Multi-class Clustering
Yihan He
Hong-Yu Chen
Yuan Cao
Jianqing Fan
Han Liu
64
1
0
09 Feb 2025
Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?
Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?
Yutong Yin
Zhaoran Wang
LRM
ReLM
313
0
0
27 Jan 2025
Training Dynamics of In-Context Learning in Linear Attention
Training Dynamics of In-Context Learning in Linear Attention
Yedi Zhang
Aaditya K. Singh
Peter E. Latham
Andrew Saxe
MLT
66
2
0
27 Jan 2025
Understanding Knowledge Hijack Mechanism in In-context Learning through
  Associative Memory
Understanding Knowledge Hijack Mechanism in In-context Learning through Associative Memory
Shuo Wang
Issei Sato
111
0
0
16 Dec 2024
4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion
4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion
Chaoyang Wang
Peiye Zhuang
Tuan Duc Ngo
Willi Menapace
Aliaksandr Siarohin
Michael Vasilkovsky
Ivan Skorokhodov
Sergey Tulyakov
Peter Wonka
Hsin-Ying Lee
DiffM
VGen
107
3
0
05 Dec 2024
Re-examining learning linear functions in context
Omar Naim
Guilhem Fouilhé
Nicholas Asher
73
0
0
18 Nov 2024
One-Layer Transformer Provably Learns One-Nearest Neighbor In Context
Zihao Li
Yuan Cao
Cheng Gao
Yihan He
Han Liu
Jason M. Klusowski
Jianqing Fan
Mengdi Wang
MLT
84
8
0
16 Nov 2024
Pretrained transformer efficiently learns low-dimensional target
  functions in-context
Pretrained transformer efficiently learns low-dimensional target functions in-context
Kazusato Oko
Yujin Song
Taiji Suzuki
Denny Wu
50
6
0
04 Nov 2024
Abrupt Learning in Transformers: A Case Study on Matrix Completion
Abrupt Learning in Transformers: A Case Study on Matrix Completion
Pulkit Gopalani
Ekdeep Singh Lubana
Wei Hu
48
2
0
29 Oct 2024
On the Role of Depth and Looping for In-Context Learning with Task
  Diversity
On the Role of Depth and Looping for In-Context Learning with Task Diversity
Khashayar Gatmiry
Nikunj Saunshi
Sashank J. Reddi
Stefanie Jegelka
Sanjiv Kumar
41
2
0
29 Oct 2024
Provable optimal transport with transformers: The essence of depth and
  prompt engineering
Provable optimal transport with transformers: The essence of depth and prompt engineering
Hadi Daneshmand
OT
55
0
0
25 Oct 2024
A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and
  Error-Aware Demonstration
A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration
Yingqian Cui
Pengfei He
Xianfeng Tang
Qi He
Chen Luo
Jiliang Tang
Yue Xing
LRM
44
6
0
21 Oct 2024
Bayesian scaling laws for in-context learning
Bayesian scaling laws for in-context learning
Aryaman Arora
Dan Jurafsky
Christopher Potts
Noah D. Goodman
44
2
0
21 Oct 2024
In-context learning and Occam's razor
In-context learning and Occam's razor
Eric Elmoznino
Tom Marty
Tejas Kasetty
Léo Gagnon
Sarthak Mittal
Mahan Fathi
Dhanya Sridhar
Guillaume Lajoie
54
1
0
17 Oct 2024
Active-Dormant Attention Heads: Mechanistically Demystifying
  Extreme-Token Phenomena in LLMs
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
Tianyu Guo
Druv Pai
Yu Bai
Jiantao Jiao
Michael I. Jordan
Song Mei
34
10
0
17 Oct 2024
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery
Renpu Liu
Ruida Zhou
Cong Shen
Jing Yang
61
0
0
17 Oct 2024
Context-Scaling versus Task-Scaling in In-Context Learning
Context-Scaling versus Task-Scaling in In-Context Learning
Amirhesam Abedsoltan
Adityanarayanan Radhakrishnan
Jingfeng Wu
M. Belkin
ReLM
LRM
47
3
0
16 Oct 2024
On the Training Convergence of Transformers for In-Context
  Classification
On the Training Convergence of Transformers for In-Context Classification
Wei Shen
Ruida Zhou
Jing Yang
Cong Shen
36
3
0
15 Oct 2024
A Theoretical Survey on Foundation Models
A Theoretical Survey on Foundation Models
Shi Fu
Yuzhu Chen
Yingjie Wang
Dacheng Tao
43
0
0
15 Oct 2024
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent
Bo Chen
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
98
20
0
15 Oct 2024
Can In-context Learning Really Generalize to Out-of-distribution Tasks?
Can In-context Learning Really Generalize to Out-of-distribution Tasks?
Qixun Wang
Yifei Wang
Yisen Wang
Xianghua Ying
OOD
47
3
0
13 Oct 2024
Can Looped Transformers Learn to Implement Multi-step Gradient Descent
  for In-context Learning?
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Khashayar Gatmiry
Nikunj Saunshi
Sashank J. Reddi
Stefanie Jegelka
Sanjiv Kumar
81
19
0
10 Oct 2024
Everything Everywhere All at Once: LLMs can In-Context Learn Multiple
  Tasks in Superposition
Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition
Zheyang Xiong
Ziyang Cai
John Cooper
Albert Ge
Vasilis Papageorgiou
...
Saurabh Agarwal
Grigorios G Chrysos
Samet Oymak
Kangwook Lee
Dimitris Papailiopoulos
LRM
54
1
0
08 Oct 2024
123
Next