Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1312.6120
Cited By
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
20 December 2013
Andrew M. Saxe
James L. McClelland
Surya Ganguli
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Exact solutions to the nonlinear dynamics of learning in deep linear neural networks"
41 / 41 papers shown
Title
The emergence of sparse attention: impact of data distribution and benefits of repetition
Nicolas Zucchet
Francesco dÁngelo
Andrew Kyle Lampinen
Stephanie C. Y. Chan
86
0
0
23 May 2025
Accelerating Learned Image Compression Through Modeling Neural Training Dynamics
Yichi Zhang
Zhihao Duan
Yuning Huang
Fengqing Zhu
137
0
0
23 May 2025
Sinusoidal Initialization, Time for a New Start
Alberto Fernández-Hernández
Jose I. Mestre
Manuel F. Dolz
Jose Duato
Enrique S. Quintana-Ortí
ODL
AI4CE
112
0
0
19 May 2025
A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models
Ziqing Xu
Hancheng Min
Salma Tarmoun
Enrique Mallada
Rene Vidal
65
0
0
16 May 2025
Shrinkage Initialization for Smooth Learning of Neural Networks
Miao Cheng
Feiyan Zhou
Hongwei Zou
Limin Wang
AI4CE
50
0
0
12 Apr 2025
Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model
Moritz A. Zanger
Pascal R. van der Vaart
Wendelin Bohmer
M. Spaan
UQCV
BDL
357
1
0
14 Mar 2025
Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)
Yoonsoo Nam
Seok Hyeong Lee
Clementine Domine
Yea Chan Park
Charles London
Wonyl Choi
Niclas Goring
Seungjai Lee
AI4CE
104
0
0
28 Feb 2025
Training Large Neural Networks With Low-Dimensional Error Feedback
Maher Hanut
Jonathan Kadmon
76
1
0
27 Feb 2025
Stacking as Accelerated Gradient Descent
Naman Agarwal
Pranjal Awasthi
Satyen Kale
Eric Zhao
ODL
94
2
0
20 Feb 2025
A distributional simplicity bias in the learning dynamics of transformers
Riccardo Rende
Federica Gerace
Alessandro Laio
Sebastian Goldt
92
8
0
17 Feb 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Dayal Singh Kalra
Tianyu He
M. Barkeshli
89
4
0
17 Feb 2025
On Vanishing Gradients, Over-Smoothing, and Over-Squashing in GNNs: Bridging Recurrent and Graph Learning
Alvaro Arroyo
Alessio Gravina
Benjamin Gutteridge
Federico Barbero
Claudio Gallicchio
Xiaowen Dong
Michael M. Bronstein
P. Vandergheynst
85
8
0
15 Feb 2025
Deep Linear Network Training Dynamics from Random Initialization: Data, Width, Depth, and Hyperparameter Transfer
Blake Bordelon
Cengiz Pehlevan
AI4CE
116
1
0
04 Feb 2025
A theoretical framework for overfitting in energy-based modeling
Giovanni Catania
A. Decelle
Cyril Furtlehner
Beatriz Seoane
103
2
0
31 Jan 2025
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
Pierfrancesco Beneventano
Blake Woodworth
MLT
73
1
0
15 Jan 2025
Singular Value Scaling: Efficient Generative Model Compression via Pruned Weights Refinement
H. Kim
Jaejun Yoo
85
0
0
23 Dec 2024
Pretraining with random noise for uncertainty calibration
Jeonghwan Cheon
Se-Bum Paik
OnRL
91
1
0
23 Dec 2024
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
Jim Zhao
Sidak Pal Singh
Aurelien Lucchi
AI4CE
78
0
0
04 Nov 2024
Bilinear Sequence Regression: A Model for Learning from Long Sequences of High-dimensional Tokens
Vittorio Erba
Emanuele Troiani
Luca Biggio
Antoine Maillard
Lenka Zdeborová
122
1
0
24 Oct 2024
Controlled Low-Rank Adaptation with Subspace Regularization for Continued Training on Large Language Models
Yuheng Lu
Bingshuo Qian
Caixia Yuan
Huixing Jiang
Xiaojie Wang
CLL
58
0
0
22 Oct 2024
How Feature Learning Can Improve Neural Scaling Laws
Blake Bordelon
Alexander B. Atanasov
Cengiz Pehlevan
70
14
0
26 Sep 2024
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
Clémentine Dominé
Nicolas Anguita
A. Proca
Lukas Braun
D. Kunin
P. Mediano
Andrew M. Saxe
67
3
0
22 Sep 2024
Remove Symmetries to Control Model Expressivity and Improve Optimization
Liu Ziyin
Yizhou Xu
Isaac Chuang
AAML
62
1
0
28 Aug 2024
InfoNCE: Identifying the Gap Between Theory and Practice
E. Rusak
Patrik Reizinger
Attila Juhos
Oliver Bringmann
Roland S. Zimmermann
Wieland Brendel
73
7
0
28 Jun 2024
Fusion-PSRO: Nash Policy Fusion for Policy Space Response Oracles
Jiesong Lian
Yucong Huang
Chengdong Ma
Mingzhi Wang
Ying Wen
Long Hu
Yixue Hao
77
0
0
31 May 2024
Pretraining with Random Noise for Fast and Robust Learning without Weight Transport
Jeonghwan Cheon
Sang Wan Lee
Se-Bum Paik
OOD
339
2
0
27 May 2024
Cascade of phase transitions in the training of Energy-based models
Dimitrios Bachtis
Giulio Biroli
A. Decelle
Beatriz Seoane
52
4
0
23 May 2024
Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations
Akshay Kumar
Jarvis Haupt
ODL
57
3
0
12 Mar 2024
Learning time-scales in two-layers neural networks
Raphael Berthier
Andrea Montanari
Kangjie Zhou
86
33
0
28 Feb 2023
KOALA: A Kalman Optimization Algorithm with Loss Adaptivity
A. Davtyan
Sepehr Sameni
L. Cerkezi
Givi Meishvili
Adam Bielski
Paolo Favaro
ODL
93
2
0
07 Jul 2021
AMEIR: Automatic Behavior Modeling, Interaction Exploration and MLP Investigation in the Recommender System
Pengyu Zhao
Kecheng Xiao
Yuanxing Zhang
Kaigui Bian
Wei Yan
63
16
0
10 Jun 2020
Two Routes to Scalable Credit Assignment without Weight Symmetry
D. Kunin
Aran Nayebi
Javier Sagastuy-Breña
Surya Ganguli
Jonathan M. Bloom
Daniel L. K. Yamins
86
33
0
28 Feb 2020
Attributed Sequence Embedding
Zhongfang Zhuang
Xiangnan Kong
Elke A. Rundensteiner
Jihane Zouaoui
Aditya Arora
142
12
0
03 Nov 2019
Generalization in multitask deep neural classifiers: a statistical physics approach
Tyler Lee
A. Ndirango
AI4CE
118
20
0
30 Oct 2019
All You Need is Beyond a Good Init: Exploring Better Solution for Training Extremely Deep Convolutional Neural Networks with Orthonormality and Modulation
Di Xie
Jiang Xiong
Shiliang Pu
86
182
0
06 Mar 2017
An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis
Yuandong Tian
MLT
114
216
0
02 Mar 2017
Exponentially vanishing sub-optimal local minima in multilayer neural networks
Daniel Soudry
Elad Hoffer
108
97
0
19 Feb 2017
Big Neural Networks Waste Capacity
Yann N. Dauphin
Yoshua Bengio
70
84
0
16 Jan 2013
On the difficulty of training Recurrent Neural Networks
Razvan Pascanu
Tomas Mikolov
Yoshua Bengio
ODL
132
5,318
0
21 Nov 2012
Multi-column Deep Neural Networks for Image Classification
D. Ciresan
U. Meier
Jürgen Schmidhuber
111
3,935
0
13 Feb 2012
Building high-level features using large scale unsupervised learning
Quoc V. Le
MarcÁurelio Ranzato
R. Monga
M. Devin
Kai Chen
G. Corrado
J. Dean
A. Ng
SSL
OffRL
CVBM
93
2,268
0
29 Dec 2011
1