Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.04815
Cited By
Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning
7 June 2023
Libin Zhu
Chaoyue Liu
Adityanarayanan Radhakrishnan
M. Belkin
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning"
13 / 13 papers shown
Title
Learning a Single Index Model from Anisotropic Data with vanilla Stochastic Gradient Descent
Guillaume Braun
Minh Ha Quang
Masaaki Imaizumi
MLT
42
0
0
31 Mar 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Dayal Singh Kalra
Tianyu He
M. Barkeshli
59
4
0
17 Feb 2025
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
Clémentine Dominé
Nicolas Anguita
A. Proca
Lukas Braun
D. Kunin
P. Mediano
Andrew M. Saxe
40
3
0
22 Sep 2024
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
73
5
1
25 May 2024
Linear Recursive Feature Machines provably recover low-rank matrices
Adityanarayanan Radhakrishnan
Misha Belkin
Dmitriy Drusvyatskiy
60
8
0
09 Jan 2024
From Stability to Chaos: Analyzing Gradient Descent Dynamics in Quadratic Regression
Xuxing Chen
Krishnakumar Balasubramanian
Promit Ghosal
Bhavya Agrawalla
38
7
0
02 Oct 2023
Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture
Libin Zhu
Chaoyue Liu
M. Belkin
GNN
AI4CE
23
4
0
24 May 2022
Understanding Gradient Descent on Edge of Stability in Deep Learning
Sanjeev Arora
Zhiyuan Li
A. Panigrahi
MLT
83
91
0
19 May 2022
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect
Yuqing Wang
Minshuo Chen
T. Zhao
Molei Tao
AI4CE
59
40
0
07 Oct 2021
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
89
72
0
29 Sep 2021
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
159
236
0
04 Mar 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
310
2,896
0
15 Sep 2016
Densely Connected Convolutional Networks
Gao Huang
Zhuang Liu
Laurens van der Maaten
Kilian Q. Weinberger
PINN
3DV
339
36,437
0
25 Aug 2016
1