Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.00297
Cited By
Transformers learn to implement preconditioned gradient descent for in-context learning
1 June 2023
Kwangjun Ahn
Xiang Cheng
Hadi Daneshmand
S. Sra
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transformers learn to implement preconditioned gradient descent for in-context learning"
50 / 127 papers shown
Title
Transformers learn variable-order Markov chains in-context
Ruida Zhou
C. Tian
Suhas Diggavi
45
0
0
07 Oct 2024
Task Diversity Shortens the ICL Plateau
Jaeyeon Kim
Sehyun Kwon
Joo Young Choi
Jongho Park
Jaewoong Cho
Jason D. Lee
Ernest K. Ryu
MoMe
44
3
0
07 Oct 2024
Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization
Xinhao Yao
Hongjin Qian
Xiaolin Hu
Gengze Xu
Wei Liu
Jian Luan
Bin Wang
Yang Liu
53
1
0
03 Oct 2024
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
Hongkang Li
Songtao Lu
Pin-Yu Chen
Xiaodong Cui
Meng Wang
LRM
35
5
0
03 Oct 2024
Towards Understanding the Universality of Transformers for Next-Token Prediction
Michael E. Sander
Gabriel Peyré
CML
66
0
0
03 Oct 2024
Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context
Spencer Frei
Gal Vardi
MLT
47
5
0
02 Oct 2024
Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models
Can Demircan
Tankred Saanum
Akshay K. Jagadish
Marcel Binz
Eric Schulz
40
3
0
02 Oct 2024
Attention layers provably solve single-location regression
Pierre Marion
Raphael Berthier
Gérard Biau
Claire Boyer
280
4
0
02 Oct 2024
Transformers Handle Endogeneity in In-Context Linear Regression
Haodong Liang
Krishnakumar Balasubramanian
Lifeng Lai
57
1
0
02 Oct 2024
Non-asymptotic Convergence of Training Transformers for Next-token Prediction
Ruiquan Huang
Yingbin Liang
Jing Yang
46
6
0
25 Sep 2024
In-Context Learning of Linear Systems: Generalization Theory and Applications to Operator Learning
Frank Cole
Yulong Lu
Wuzhe Xu
Tianhao Zhang
60
3
0
18 Sep 2024
Transformers are Minimax Optimal Nonparametric In-Context Learners
Juno Kim
Tai Nakamaki
Taiji Suzuki
58
12
0
22 Aug 2024
How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression
Xingwu Chen
Lei Zhao
Difan Zou
60
6
0
08 Aug 2024
Transformers are Universal In-context Learners
Takashi Furuya
Maarten V. de Hoop
Gabriel Peyré
73
9
0
02 Aug 2024
Representing Rule-based Chatbots with Transformers
Dan Friedman
Abhishek Panigrahi
Danqi Chen
82
1
0
15 Jul 2024
Fine-grained Analysis of In-context Linear Estimation: Data, Architecture, and Beyond
Yingcong Li
A. S. Rawat
Samet Oymak
37
7
0
13 Jul 2024
HiPPO-Prophecy: State-Space Models can Provably Learn Dynamical Systems in Context
Federico Arangath Joseph
K. Haefeli
Noah Liniger
Çağlar Gülçehre
31
2
0
12 Jul 2024
Distributed Rule Vectors is A Key Mechanism in Large Language Models' In-Context Learning
Bowen Zheng
Ming Ma
Zhongqiao Lin
Tianming Yang
41
2
0
23 Jun 2024
Probing the Decision Boundaries of In-context Learning in Large Language Models
Siyan Zhao
Tung Nguyen
Aditya Grover
70
6
0
17 Jun 2024
Enhancing In-Context Learning Performance with just SVD-Based Weight Pruning: A Theoretical Perspective
Xinhao Yao
Xiaolin Hu
Shenzhi Yang
Yong Liu
57
2
0
06 Jun 2024
Universal In-Context Approximation By Prompting Fully Recurrent Models
Aleksandar Petrov
Tom A. Lamb
Alasdair Paren
Philip Torr
Adel Bibi
LRM
40
0
0
03 Jun 2024
Why Larger Language Models Do In-context Learning Differently?
Zhenmei Shi
Junyi Wei
Zhuoyan Xu
Yingyu Liang
39
23
0
30 May 2024
A Theoretical Understanding of Self-Correction through In-context Alignment
Yifei Wang
Yuyang Wu
Zeming Wei
Stefanie Jegelka
Yisen Wang
LRM
58
20
0
28 May 2024
IM-Context: In-Context Learning for Imbalanced Regression Tasks
Ismail Nejjar
Faez Ahmed
Olga Fink
45
1
0
28 May 2024
On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability
Chenyu Zheng
Wei Huang
Rongzheng Wang
Guoqiang Wu
Jun Zhu
Chongxuan Li
55
2
0
27 May 2024
Automatic Domain Adaptation by Transformers in In-Context Learning
Ryuichiro Hataya
Kota Matsui
Masaaki Imaizumi
47
1
0
27 May 2024
On Understanding Attention-Based In-Context Learning for Categorical Data
Aaron T. Wang
William Convertino
Xiang Cheng
Ricardo Henao
Lawrence Carin
89
0
0
27 May 2024
Towards Better Understanding of In-Context Learning Ability from In-Context Uncertainty Quantification
Shang Liu
Zhongze Cai
Guanting Chen
Xiaocheng Li
UQCV
52
1
0
24 May 2024
DETAIL: Task DEmonsTration Attribution for Interpretable In-context Learning
Zijian Zhou
Xiaoqiang Lin
Xinyi Xu
Alok Prakash
Daniela Rus
K. H. Low
41
4
0
22 May 2024
Asymptotic theory of in-context learning by linear attention
Yue M. Lu
Mary I. Letey
Jacob A. Zavatone-Veth
Anindita Maiti
Cengiz Pehlevan
42
11
0
20 May 2024
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples
Robert Vacareanu
Vlad-Andrei Negru
Vasile Suciu
Mihai Surdeanu
41
31
0
11 Apr 2024
Can large language models explore in-context?
Akshay Krishnamurthy
Keegan Harris
Dylan J. Foster
Cyril Zhang
Aleksandrs Slivkins
LM&Ro
LLMAG
LRM
156
24
0
22 Mar 2024
Transfer Learning Beyond Bounded Density Ratios
Alkis Kalavasis
Ilias Zadik
Manolis Zampetakis
55
5
0
18 Mar 2024
How Well Can Transformers Emulate In-context Newton's Method?
Angeliki Giannou
Liu Yang
Tianhao Wang
Dimitris Papailiopoulos
Jason D. Lee
48
17
0
05 Mar 2024
How Do Nonlinear Transformers Learn and Generalize in In-Context Learning?
Hongkang Li
Meng Wang
Songtao Lu
Xiaodong Cui
Pin-Yu Chen
MLT
72
14
0
23 Feb 2024
In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization
Ruiqi Zhang
Jingfeng Wu
Peter L. Bartlett
57
15
0
22 Feb 2024
Linear Transformers are Versatile In-Context Learners
Max Vladymyrov
J. Oswald
Mark Sandler
Rong Ge
52
15
0
21 Feb 2024
How do Transformers perform In-Context Autoregressive Learning?
Michael E. Sander
Raja Giryes
Taiji Suzuki
Mathieu Blondel
Gabriel Peyré
52
9
0
08 Feb 2024
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Jongho Park
Jaeseung Park
Zheyang Xiong
Nayoung Lee
Jaewoong Cho
Samet Oymak
Kangwook Lee
Dimitris Papailiopoulos
61
71
0
06 Feb 2024
Is Mamba Capable of In-Context Learning?
Riccardo Grazzi
Julien N. Siems
Simon Schrodi
Thomas Brox
Frank Hutter
37
40
0
05 Feb 2024
Can MLLMs Perform Text-to-Image In-Context Learning?
Yuchen Zeng
Wonjun Kang
Yicong Chen
Hyung Il Koo
Kangwook Lee
MLLM
41
11
0
02 Feb 2024
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape
Juno Kim
Taiji Suzuki
40
22
0
02 Feb 2024
Theoretical Understanding of In-Context Learning in Shallow Transformers with Unstructured Data
Yue Xing
Xiaofeng Lin
Chenheng Xu
Namjoon Suh
Qifan Song
Guang Cheng
56
3
0
01 Feb 2024
Superiority of Multi-Head Attention in In-Context Linear Regression
Yingqian Cui
Jie Ren
Pengfei He
Jiliang Tang
Yue Xing
43
15
0
30 Jan 2024
An Information-Theoretic Analysis of In-Context Learning
Hong Jun Jeon
Jason D. Lee
Qi Lei
Benjamin Van Roy
54
21
0
28 Jan 2024
Anchor function: a type of benchmark functions for studying language models
Zhongwang Zhang
Zhiwei Wang
Junjie Yao
Zhangchen Zhou
Xiaolong Li
E. Weinan
Z. Xu
56
6
0
16 Jan 2024
Setting the Record Straight on Transformer Oversmoothing
G. Dovonon
M. Bronstein
Matt J. Kusner
42
5
0
09 Jan 2024
Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
Xiang Cheng
Yuxin Chen
S. Sra
23
38
0
11 Dec 2023
The mechanistic basis of data dependence and abrupt learning in an in-context classification task
Gautam Reddy
32
57
0
03 Dec 2023
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
Rahul Ramesh
Ekdeep Singh Lubana
Mikail Khona
Robert P. Dick
Hidenori Tanaka
CoGe
43
8
0
21 Nov 2023
Previous
1
2
3
Next