Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2003.02218
Cited By
The large learning rate phase of deep learning: the catapult mechanism
4 March 2020
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The large learning rate phase of deep learning: the catapult mechanism"
38 / 38 papers shown
Title
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
Ruiqi Zhang
Jingfeng Wu
Licong Lin
Peter L. Bartlett
68
2
0
05 Apr 2025
On the Cone Effect in the Learning Dynamics
Zhanpeng Zhou
Yongyi Yang
Jie Ren
Mahito Sugiyama
Junchi Yan
86
0
0
20 Mar 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Dayal Singh Kalra
Tianyu He
M. Barkeshli
112
6
0
17 Feb 2025
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
Pierfrancesco Beneventano
Blake Woodworth
MLT
84
1
0
15 Jan 2025
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
Jim Zhao
Sidak Pal Singh
Aurelien Lucchi
AI4CE
102
0
0
04 Nov 2024
SGD with memory: fundamental properties and stochastic acceleration
Dmitry Yarotsky
Maksim Velikanov
68
1
0
05 Oct 2024
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
Clémentine Dominé
Nicolas Anguita
A. Proca
Lukas Braun
D. Kunin
P. Mediano
Andrew M. Saxe
92
5
0
22 Sep 2024
The Two Regimes of Deep Network Training
Guillaume Leclerc
Aleksander Madry
63
45
0
24 Feb 2020
The Early Phase of Neural Network Training
Jonathan Frankle
D. Schwab
Ari S. Morcos
87
172
0
24 Feb 2020
The Break-Even Point on Optimization Trajectories of Deep Neural Networks
Stanislaw Jastrzebski
Maciej Szymczak
Stanislav Fort
Devansh Arpit
Jacek Tabor
Kyunghyun Cho
Krzysztof J. Geras
78
161
0
21 Feb 2020
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima
Zeke Xie
Issei Sato
Masashi Sugiyama
ODL
56
17
0
10 Feb 2020
Disentangling Trainability and Generalization in Deep Neural Networks
Lechao Xiao
Jeffrey Pennington
S. Schoenholz
57
34
0
30 Dec 2019
Neural Tangents: Fast and Easy Infinite Neural Networks in Python
Roman Novak
Lechao Xiao
Jiri Hron
Jaehoon Lee
Alexander A. Alemi
Jascha Narain Sohl-Dickstein
S. Schoenholz
70
228
0
05 Dec 2019
Fantastic Generalization Measures and Where to Find Them
Yiding Jiang
Behnam Neyshabur
H. Mobahi
Dilip Krishnan
Samy Bengio
AI4CE
134
606
0
04 Dec 2019
Asymptotics of Wide Networks from Feynman Diagrams
Ethan Dyer
Guy Gur-Ari
71
114
0
25 Sep 2019
Dynamics of Deep Neural Networks and Neural Tangent Hierarchy
Jiaoyang Huang
H. Yau
52
150
0
18 Sep 2019
Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks
Yuanzhi Li
Colin Wei
Tengyu Ma
47
299
0
10 Jul 2019
Kernel and Rich Regimes in Overparametrized Models
Blake E. Woodworth
Suriya Gunasekar
Pedro H. P. Savarese
E. Moroshko
Itay Golan
Jason D. Lee
Daniel Soudry
Nathan Srebro
77
363
0
13 Jun 2019
The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study
Daniel S. Park
Jascha Narain Sohl-Dickstein
Quoc V. Le
Samuel L. Smith
81
57
0
09 May 2019
On Exact Computation with an Infinitely Wide Neural Net
Sanjeev Arora
S. Du
Wei Hu
Zhiyuan Li
Ruslan Salakhutdinov
Ruosong Wang
220
923
0
26 Apr 2019
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
Jaehoon Lee
Lechao Xiao
S. Schoenholz
Yasaman Bahri
Roman Novak
Jascha Narain Sohl-Dickstein
Jeffrey Pennington
211
1,101
0
18 Feb 2019
On Lazy Training in Differentiable Programming
Lénaïc Chizat
Edouard Oyallon
Francis R. Bach
111
833
0
19 Dec 2018
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks
Difan Zou
Yuan Cao
Dongruo Zhou
Quanquan Gu
ODL
185
448
0
21 Nov 2018
A Convergence Theory for Deep Learning via Over-Parameterization
Zeyuan Allen-Zhu
Yuanzhi Li
Zhao Song
AI4CE
ODL
247
1,462
0
09 Nov 2018
Gradient Descent Finds Global Minima of Deep Neural Networks
S. Du
Jason D. Lee
Haochuan Li
Liwei Wang
Masayoshi Tomizuka
ODL
194
1,135
0
09 Nov 2018
Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes
Roman Novak
Lechao Xiao
Jaehoon Lee
Yasaman Bahri
Greg Yang
Jiri Hron
Daniel A. Abolafia
Jeffrey Pennington
Jascha Narain Sohl-Dickstein
UQCV
BDL
63
309
0
11 Oct 2018
Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data
Yuanzhi Li
Yingyu Liang
MLT
216
653
0
03 Aug 2018
Stochastic natural gradient descent draws posterior samples in function space
Samuel L. Smith
Daniel Duckworth
Semon Rezchikov
Quoc V. Le
Jascha Narain Sohl-Dickstein
BDL
43
6
0
25 Jun 2018
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Arthur Jacot
Franck Gabriel
Clément Hongler
267
3,195
0
20 Jun 2018
A Mean Field View of the Landscape of Two-Layers Neural Networks
Song Mei
Andrea Montanari
Phan-Minh Nguyen
MLT
86
858
0
18 Apr 2018
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
99
995
0
01 Nov 2017
Deep Neural Networks as Gaussian Processes
Jaehoon Lee
Yasaman Bahri
Roman Novak
S. Schoenholz
Jeffrey Pennington
Jascha Narain Sohl-Dickstein
UQCV
BDL
118
1,093
0
01 Nov 2017
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
Samuel L. Smith
Quoc V. Le
BDL
61
251
0
17 Oct 2017
Stochastic Gradient Descent as Approximate Bayesian Inference
Stephan Mandt
Matthew D. Hoffman
David M. Blei
BDL
55
597
0
13 Apr 2017
Sharp Minima Can Generalize For Deep Nets
Laurent Dinh
Razvan Pascanu
Samy Bengio
Yoshua Bengio
ODL
114
772
0
15 Mar 2017
SGD Learns the Conjugate Kernel Class of the Network
Amit Daniely
186
181
0
27 Feb 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
424
2,937
0
15 Sep 2016
Wide Residual Networks
Sergey Zagoruyko
N. Komodakis
337
7,984
0
23 May 2016
1