Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.07042
Cited By
Transformers learn through gradual rank increase
12 June 2023
Enric Boix-Adserà
Etai Littwin
Emmanuel Abbe
Samy Bengio
J. Susskind
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transformers learn through gradual rank increase"
12 / 12 papers shown
Title
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
Ruiquan Huang
Yingbin Liang
Jing Yang
55
0
0
02 May 2025
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
Hongkang Li
Yihua Zhang
Shuai Zhang
Ming Wang
Sijia Liu
Pin-Yu Chen
MoMe
71
4
0
15 Apr 2025
A distributional simplicity bias in the learning dynamics of transformers
Riccardo Rende
Federica Gerace
Alessandro Laio
Sebastian Goldt
79
8
0
17 Feb 2025
DiTASK: Multi-Task Fine-Tuning with Diffeomorphic Transformations
Krishna Sri Ipsit Mantri
Carola-Bibiane Schönlieb
Bruno Ribeiro
Chaim Baskin
Moshe Eliasof
43
0
0
09 Feb 2025
Geometric Signatures of Compositionality Across a Language Model's Lifetime
Jin Hwa Lee
Thomas Jiralerspong
Lei Yu
Yoshua Bengio
Emily Cheng
CoGe
87
1
0
02 Oct 2024
Reasoning in Large Language Models: A Geometric Perspective
Romain Cosentino
Sarath Shekkizhar
LRM
44
2
0
02 Jul 2024
Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers
Lorenzo Tiberi
Francesca Mignacco
Kazuki Irie
H. Sompolinsky
44
6
0
24 May 2024
Saddle-to-Saddle Dynamics in Diagonal Linear Networks
Scott Pesme
Nicolas Flammarion
33
35
0
02 Apr 2023
SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics
Emmanuel Abbe
Enric Boix-Adserà
Theodor Misiakiewicz
FedML
MLT
84
73
0
21 Feb 2023
Learning Single-Index Models with Shallow Neural Networks
A. Bietti
Joan Bruna
Clayton Sanford
M. Song
170
68
0
27 Oct 2022
Neural Networks Efficiently Learn Low-Dimensional Representations with SGD
Alireza Mousavi-Hosseini
Sejun Park
M. Girotti
Ioannis Mitliagkas
Murat A. Erdogdu
MLT
324
48
0
29 Sep 2022
Implicit Regularization in Deep Tensor Factorization
P. Milanesi
Hachem Kadri
Stéphane Ayache
Thierry Artières
54
9
0
04 May 2021
1