ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.16265
  4. Cited By
Training Dynamics of In-Context Learning in Linear Attention
v1v2 (latest)

Training Dynamics of In-Context Learning in Linear Attention

27 January 2025
Yedi Zhang
Aaditya K. Singh
Peter E. Latham
Andrew Saxe
    MLT
ArXiv (abs)PDFHTML

Papers citing "Training Dynamics of In-Context Learning in Linear Attention"

9 / 9 papers shown
Title
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
D. Kunin
Giovanni Luca Marchetti
F. Chen
Dhruva Karkada
James B. Simon
M. DeWeese
Surya Ganguli
Nina Miolane
34
0
0
06 Jun 2025
The emergence of sparse attention: impact of data distribution and benefits of repetition
The emergence of sparse attention: impact of data distribution and benefits of repetition
Nicolas Zucchet
Francesco dÁngelo
Andrew Kyle Lampinen
Stephanie C. Y. Chan
214
1
0
23 May 2025
Strategy Coopetition Explains the Emergence and Transience of In-Context Learning
Aaditya K. Singh
Ted Moskovitz
Sara Dragutinovic
Felix Hill
Stephanie C. Y. Chan
Andrew Saxe
443
5
0
07 Mar 2025
Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought
Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought
Jianhao Huang
Zixuan Wang
Jason D. Lee
LRM
103
3
0
28 Feb 2025
A distributional simplicity bias in the learning dynamics of transformers
A distributional simplicity bias in the learning dynamics of transformers
Riccardo Rende
Federica Gerace
Alessandro Laio
Sebastian Goldt
127
9
0
17 Feb 2025
Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting
Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting
Suraj Anand
Michael A. Lepori
Jack Merullo
Ellie Pavlick
CLL
122
8
0
28 May 2024
Asymptotic theory of in-context learning by linear attention
Asymptotic theory of in-context learning by linear attention
Yue M. Lu
Mary I. Letey
Jacob A. Zavatone-Veth
Anindita Maiti
Cengiz Pehlevan
98
16
0
20 May 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
117
21
0
08 Feb 2024
Geometry of Linear Neural Networks: Equivariance and Invariance under Permutation Groups
Geometry of Linear Neural Networks: Equivariance and Invariance under Permutation Groups
Kathlén Kohn
Anna-Laura Sattelberger
Vahid Shahverdi
97
4
0
24 Sep 2023
1