Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00065
Cited By
Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability
26 February 2021
Jeremy M. Cohen
Simran Kaur
Yuanzhi Li
J. Zico Kolter
Ameet Talwalkar
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability"
29 / 79 papers shown
Title
Learning threshold neurons via the "edge of stability"
Kwangjun Ahn
Sébastien Bubeck
Sinho Chewi
Y. Lee
Felipe Suarez
Yi Zhang
MLT
38
36
0
14 Dec 2022
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States
Ziqiao Wang
Yongyi Mao
30
10
0
19 Nov 2022
Grokking phase transitions in learning local rules with gradient descent
Bojan Žunkovič
E. Ilievski
63
17
0
26 Oct 2022
Optimisation & Generalisation in Networks of Neurons
Jeremy Bernstein
AI4CE
24
2
0
18 Oct 2022
SGD with Large Step Sizes Learns Sparse Features
Maksym Andriushchenko
Aditya Varre
Loucas Pillaud-Vivien
Nicolas Flammarion
45
56
0
11 Oct 2022
Second-order regression models exhibit progressive sharpening to the edge of stability
Atish Agarwala
Fabian Pedregosa
Jeffrey Pennington
35
26
0
10 Oct 2022
On skip connections and normalisation layers in deep optimisation
L. MacDonald
Jack Valmadre
Hemanth Saratchandran
Simon Lucey
ODL
34
1
0
10 Oct 2022
Understanding Edge-of-Stability Training Dynamics with a Minimalist Example
Xingyu Zhu
Zixuan Wang
Xiang Wang
Mo Zhou
Rong Ge
66
35
0
07 Oct 2022
The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima
Peter L. Bartlett
Philip M. Long
Olivier Bousquet
76
34
0
04 Oct 2022
A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases
James Harrison
Luke Metz
Jascha Narain Sohl-Dickstein
49
22
0
22 Sep 2022
On the Implicit Bias in Deep-Learning Algorithms
Gal Vardi
FedML
AI4CE
34
73
0
26 Aug 2022
On the generalization of learning algorithms that do not converge
N. Chandramoorthy
Andreas Loukas
Khashayar Gatmiry
Stefanie Jegelka
MLT
19
11
0
16 Aug 2022
Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability
Z. Li
Zixuan Wang
Jian Li
19
44
0
26 Jul 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
45
71
0
14 Jun 2022
High-dimensional limit theorems for SGD: Effective dynamics and critical scaling
Gerard Ben Arous
Reza Gheissari
Aukosh Jagannath
62
58
0
08 Jun 2022
Batch Normalization Is Blind to the First and Second Derivatives of the Loss
Zhanpeng Zhou
Wen Shen
Huixin Chen
Ling Tang
Quanshi Zhang
34
2
0
30 May 2022
High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation
Jimmy Ba
Murat A. Erdogdu
Taiji Suzuki
Zhichao Wang
Denny Wu
Greg Yang
MLT
42
121
0
03 May 2022
Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes
Chao Ma
D. Kunin
Lei Wu
Lexing Ying
25
27
0
24 Apr 2022
Understanding the unstable convergence of gradient descent
Kwangjun Ahn
J.N. Zhang
S. Sra
36
57
0
03 Apr 2022
On the Benefits of Large Learning Rates for Kernel Methods
Gaspard Beugnot
Julien Mairal
Alessandro Rudi
27
11
0
28 Feb 2022
Robust Training of Neural Networks Using Scale Invariant Architectures
Zhiyuan Li
Srinadh Bhojanapalli
Manzil Zaheer
Sashank J. Reddi
Surinder Kumar
29
27
0
02 Feb 2022
Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations
Jiayao Zhang
Hua Wang
Weijie J. Su
35
8
0
11 Oct 2021
A Loss Curvature Perspective on Training Instability in Deep Learning
Justin Gilmer
Behrooz Ghorbani
Ankush Garg
Sneha Kudugunta
Behnam Neyshabur
David E. Cardoze
George E. Dahl
Zachary Nado
Orhan Firat
ODL
36
35
0
08 Oct 2021
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
89
72
0
29 Sep 2021
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion
D. Kunin
Javier Sagastuy-Breña
Lauren Gillespie
Eshed Margalit
Hidenori Tanaka
Surya Ganguli
Daniel L. K. Yamins
31
15
0
19 Jul 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
Zhiyuan Li
Sadhika Malladi
Sanjeev Arora
44
78
0
24 Feb 2021
Provable Super-Convergence with a Large Cyclical Learning Rate
Samet Oymak
35
12
0
22 Feb 2021
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
159
235
0
04 Mar 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
308
2,892
0
15 Sep 2016
Previous
1
2