Training Dynamics of In-Context Learning in Linear Attention

v1v2 (latest)

Training Dynamics of In-Context Learning in Linear Attention

27 January 2025

Aaditya K. Singh

Peter E. Latham

ArXiv (abs)PDF HTML

Papers citing "Training Dynamics of In-Context Learning in Linear Attention"

9 / 9 papers shown

Title
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks D. Kunin Giovanni Luca Marchetti F. Chen Dhruva Karkada James B. Simon M. DeWeese Surya Ganguli Nina Miolane 34 0 0 06 Jun 2025
The emergence of sparse attention: impact of data distribution and benefits of repetition Nicolas Zucchet Francesco dÁngelo Andrew Kyle Lampinen Stephanie C. Y. Chan 214 1 0 23 May 2025
Strategy Coopetition Explains the Emergence and Transience of In-Context Learning Aaditya K. Singh Ted Moskovitz Sara Dragutinovic Felix Hill Stephanie C. Y. Chan Andrew Saxe 443 5 0 07 Mar 2025
Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought Jianhao Huang Zixuan Wang Jason D. Lee LRM 103 3 0 28 Feb 2025
A distributional simplicity bias in the learning dynamics of transformers Riccardo Rende Federica Gerace Alessandro Laio Sebastian Goldt 127 9 0 17 Feb 2025
Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting Suraj Anand Michael A. Lepori Jack Merullo Ellie Pavlick CLL 122 8 0 28 May 2024
Asymptotic theory of in-context learning by linear attention Yue M. Lu Mary I. Letey Jacob A. Zavatone-Veth Anindita Maiti Cengiz Pehlevan 98 16 0 20 May 2024
Implicit Bias and Fast Convergence Rates for Self-attention Bhavya Vasudeva Puneesh Deora Christos Thrampoulidis 117 21 0 08 Feb 2024
Geometry of Linear Neural Networks: Equivariance and Invariance under Permutation Groups Kathlén Kohn Anna-Laura Sattelberger Vahid Shahverdi 97 4 0 24 Sep 2023