Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.00297
Cited By
Transformers learn to implement preconditioned gradient descent for in-context learning
1 June 2023
Kwangjun Ahn
Xiang Cheng
Hadi Daneshmand
S. Sra
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transformers learn to implement preconditioned gradient descent for in-context learning"
50 / 127 papers shown
Title
Optimization-Inspired Few-Shot Adaptation for Large Language Models
Boyan Gao
Xin Wang
Yibo Yang
David A. Clifton
5
0
0
25 May 2025
Continuum Transformers Perform In-Context Learning by Operator Gradient Descent
Abhiti Mishra
Yash Patel
Ambuj Tewari
10
0
0
23 May 2025
From Compression to Expansion: A Layerwise Analysis of In-Context Learning
Jiachen Jiang
Yuxin Dong
Jinxin Zhou
Zhihui Zhu
5
0
0
22 May 2025
Out-of-Distribution Generalization of In-Context Learning: A Low-Dimensional Subspace Perspective
Soo Min Kwon
Alec S. Xu
Can Yaras
Laura Balzano
Qing Qu
OOD
17
0
0
20 May 2025
Adversarially Pretrained Transformers may be Universally Robust In-Context Learners
Soichiro Kumano
Hiroshi Kera
Toshihiko Yamasaki
AAML
30
0
0
20 May 2025
Attention-based clustering
Rodrigo Maulen-Soto
Claire Boyer
Pierre Marion
24
0
0
19 May 2025
Rethinking Invariance in In-context Learning
Lizhe Fang
Yifei Wang
Khashayar Gatmiry
Lei Fang
Yun Wang
61
3
0
08 May 2025
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
Ruiquan Huang
Yingbin Liang
Jing Yang
72
0
0
02 May 2025
Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations
Yiyou Sun
Y. Gai
Lijie Chen
Abhilasha Ravichander
Yejin Choi
D. Song
HILM
62
1
0
17 Apr 2025
Exact Learning Dynamics of In-Context Learning in Linear Transformers and Its Application to Non-Linear Transformers
Nischal Mainali
Lucas Teixeira
36
0
0
17 Apr 2025
Can Pre-training Indicators Reliably Predict Fine-tuning Outcomes of LLMs?
Hansi Zeng
Kai Hui
Honglei Zhuang
Zhen Qin
Zhenrui Yue
Hamed Zamani
Dana Alon
45
0
0
16 Apr 2025
Reasoning without Regret
Tarun Chitra
OffRL
LRM
47
0
0
14 Apr 2025
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning
Yingcong Li
Davoud Ataee Tarzanagh
A. S. Rawat
Maryam Fazel
Samet Oymak
32
0
0
06 Apr 2025
Contextualize-then-Aggregate: Circuits for In-Context Learning in Gemma-2 2B
Aleksandra Bakalova
Yana Veitsman
Xinting Huang
Michael Hahn
46
0
0
31 Mar 2025
Experience Replay Addresses Loss of Plasticity in Continual Learning
Jiuqi Wang
Rohan Chandra
Shangtong Zhang
CLL
KELM
77
0
0
25 Mar 2025
Transformer-based Wireless Symbol Detection Over Fading Channels
Li Fan
Jing Yang
Cong Shen
61
0
0
20 Mar 2025
Test-Time Training Provably Improves Transformers as In-context Learners
Halil Alperen Gozeten
M. E. Ildiz
Xuechen Zhang
Mahdi Soltanolkotabi
Marco Mondelli
Samet Oymak
71
1
0
14 Mar 2025
When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective
Alireza Mousavi-Hosseini
Clayton Sanford
Denny Wu
Murat A. Erdogdu
63
0
0
14 Mar 2025
Taming Knowledge Conflicts in Language Models
Gaotang Li
Yuzhong Chen
Hanghang Tong
KELM
62
1
0
14 Mar 2025
Provable Benefits of Task-Specific Prompts for In-context Learning
Xiangyu Chang
Yingcong Li
Muti Kara
Samet Oymak
Amit K. Roy-Chowdhury
74
0
0
03 Mar 2025
In-Context Learning with Hypothesis-Class Guidance
Ziqian Lin
Shubham Kumar Bharti
Kangwook Lee
104
0
0
27 Feb 2025
On the Robustness of Transformers against Context Hijacking for Linear Classification
Tianle Li
Chenyang Zhang
Xingwu Chen
Yuan Cao
Difan Zou
83
1
0
24 Feb 2025
Towards Auto-Regressive Next-Token Prediction: In-Context Learning Emerges from Generalization
Zixuan Gong
Xiaolin Hu
Huayi Tang
Yong Liu
67
0
0
24 Feb 2025
Ask, and it shall be given: On the Turing completeness of prompting
Ruizhong Qiu
Zhe Xu
Wenxuan Bao
Hanghang Tong
ReLM
LRM
AI4CE
100
0
0
24 Feb 2025
CoT-ICL Lab: A Synthetic Framework for Studying Chain-of-Thought Learning from In-Context Demonstrations
Vignesh Kothapalli
Hamed Firooz
Maziar Sanjabi
75
0
0
21 Feb 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
Yufa Zhou
101
18
0
21 Feb 2025
Vector-ICL: In-context Learning with Continuous Vector Representations
Yufan Zhuang
Chandan Singh
Liyuan Liu
Jingbo Shang
Jianfeng Gao
81
5
0
21 Feb 2025
Transformers versus the EM Algorithm in Multi-class Clustering
Yihan He
Hong-Yu Chen
Yuan Cao
Jianqing Fan
Han Liu
64
1
0
09 Feb 2025
Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?
Yutong Yin
Zhaoran Wang
LRM
ReLM
313
0
0
27 Jan 2025
Training Dynamics of In-Context Learning in Linear Attention
Yedi Zhang
Aaditya K. Singh
Peter E. Latham
Andrew Saxe
MLT
66
2
0
27 Jan 2025
Understanding Knowledge Hijack Mechanism in In-context Learning through Associative Memory
Shuo Wang
Issei Sato
111
0
0
16 Dec 2024
4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion
Chaoyang Wang
Peiye Zhuang
Tuan Duc Ngo
Willi Menapace
Aliaksandr Siarohin
Michael Vasilkovsky
Ivan Skorokhodov
Sergey Tulyakov
Peter Wonka
Hsin-Ying Lee
DiffM
VGen
107
3
0
05 Dec 2024
Re-examining learning linear functions in context
Omar Naim
Guilhem Fouilhé
Nicholas Asher
73
0
0
18 Nov 2024
One-Layer Transformer Provably Learns One-Nearest Neighbor In Context
Zihao Li
Yuan Cao
Cheng Gao
Yihan He
Han Liu
Jason M. Klusowski
Jianqing Fan
Mengdi Wang
MLT
84
8
0
16 Nov 2024
Pretrained transformer efficiently learns low-dimensional target functions in-context
Kazusato Oko
Yujin Song
Taiji Suzuki
Denny Wu
50
6
0
04 Nov 2024
Abrupt Learning in Transformers: A Case Study on Matrix Completion
Pulkit Gopalani
Ekdeep Singh Lubana
Wei Hu
48
2
0
29 Oct 2024
On the Role of Depth and Looping for In-Context Learning with Task Diversity
Khashayar Gatmiry
Nikunj Saunshi
Sashank J. Reddi
Stefanie Jegelka
Sanjiv Kumar
41
2
0
29 Oct 2024
Provable optimal transport with transformers: The essence of depth and prompt engineering
Hadi Daneshmand
OT
55
0
0
25 Oct 2024
A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration
Yingqian Cui
Pengfei He
Xianfeng Tang
Qi He
Chen Luo
Jiliang Tang
Yue Xing
LRM
44
6
0
21 Oct 2024
Bayesian scaling laws for in-context learning
Aryaman Arora
Dan Jurafsky
Christopher Potts
Noah D. Goodman
44
2
0
21 Oct 2024
In-context learning and Occam's razor
Eric Elmoznino
Tom Marty
Tejas Kasetty
Léo Gagnon
Sarthak Mittal
Mahan Fathi
Dhanya Sridhar
Guillaume Lajoie
54
1
0
17 Oct 2024
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
Tianyu Guo
Druv Pai
Yu Bai
Jiantao Jiao
Michael I. Jordan
Song Mei
34
10
0
17 Oct 2024
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery
Renpu Liu
Ruida Zhou
Cong Shen
Jing Yang
61
0
0
17 Oct 2024
Context-Scaling versus Task-Scaling in In-Context Learning
Amirhesam Abedsoltan
Adityanarayanan Radhakrishnan
Jingfeng Wu
M. Belkin
ReLM
LRM
47
3
0
16 Oct 2024
On the Training Convergence of Transformers for In-Context Classification
Wei Shen
Ruida Zhou
Jing Yang
Cong Shen
36
3
0
15 Oct 2024
A Theoretical Survey on Foundation Models
Shi Fu
Yuzhu Chen
Yingjie Wang
Dacheng Tao
43
0
0
15 Oct 2024
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent
Bo Chen
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
98
20
0
15 Oct 2024
Can In-context Learning Really Generalize to Out-of-distribution Tasks?
Qixun Wang
Yifei Wang
Yisen Wang
Xianghua Ying
OOD
47
3
0
13 Oct 2024
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Khashayar Gatmiry
Nikunj Saunshi
Sashank J. Reddi
Stefanie Jegelka
Sanjiv Kumar
81
19
0
10 Oct 2024
Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition
Zheyang Xiong
Ziyang Cai
John Cooper
Albert Ge
Vasilis Papageorgiou
...
Saurabh Agarwal
Grigorios G Chrysos
Samet Oymak
Kangwook Lee
Dimitris Papailiopoulos
LRM
54
1
0
08 Oct 2024
1
2
3
Next