Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.16265
Cited By
v1
v2 (latest)
Training Dynamics of In-Context Learning in Linear Attention
27 January 2025
Yedi Zhang
Aaditya K. Singh
Peter E. Latham
Andrew Saxe
MLT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Training Dynamics of In-Context Learning in Linear Attention"
9 / 9 papers shown
Title
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
D. Kunin
Giovanni Luca Marchetti
F. Chen
Dhruva Karkada
James B. Simon
M. DeWeese
Surya Ganguli
Nina Miolane
34
0
0
06 Jun 2025
The emergence of sparse attention: impact of data distribution and benefits of repetition
Nicolas Zucchet
Francesco dÁngelo
Andrew Kyle Lampinen
Stephanie C. Y. Chan
214
1
0
23 May 2025
Strategy Coopetition Explains the Emergence and Transience of In-Context Learning
Aaditya K. Singh
Ted Moskovitz
Sara Dragutinovic
Felix Hill
Stephanie C. Y. Chan
Andrew Saxe
443
5
0
07 Mar 2025
Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought
Jianhao Huang
Zixuan Wang
Jason D. Lee
LRM
103
3
0
28 Feb 2025
A distributional simplicity bias in the learning dynamics of transformers
Riccardo Rende
Federica Gerace
Alessandro Laio
Sebastian Goldt
127
9
0
17 Feb 2025
Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting
Suraj Anand
Michael A. Lepori
Jack Merullo
Ellie Pavlick
CLL
122
8
0
28 May 2024
Asymptotic theory of in-context learning by linear attention
Yue M. Lu
Mary I. Letey
Jacob A. Zavatone-Veth
Anindita Maiti
Cengiz Pehlevan
98
16
0
20 May 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
117
21
0
08 Feb 2024
Geometry of Linear Neural Networks: Equivariance and Invariance under Permutation Groups
Kathlén Kohn
Anna-Laura Sattelberger
Vahid Shahverdi
97
4
0
24 Sep 2023
1