The large learning rate phase of deep learning: the catapult mechanism

4 March 2020

Aitor Lewkowycz

Yasaman Bahri

Ethan Dyer

Jascha Narain Sohl-Dickstein

Guy Gur-Ari

ODL

ArXiv PDF HTML

Papers citing "The large learning rate phase of deep learning: the catapult mechanism"

38 / 38 papers shown

Title
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes Ruiqi Zhang Jingfeng Wu Licong Lin Peter L. Bartlett 68 2 0 05 Apr 2025
On the Cone Effect in the Learning Dynamics Zhanpeng Zhou Yongyi Yang Jie Ren Mahito Sugiyama Junchi Yan 86 0 0 20 Mar 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos Dayal Singh Kalra Tianyu He M. Barkeshli 112 6 0 17 Feb 2025
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks Pierfrancesco Beneventano Blake Woodworth MLT 84 1 0 15 Jan 2025
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks Jim Zhao Sidak Pal Singh Aurelien Lucchi AI4CE 102 0 0 04 Nov 2024
SGD with memory: fundamental properties and stochastic acceleration Dmitry Yarotsky Maksim Velikanov 68 1 0 05 Oct 2024
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks Clémentine Dominé Nicolas Anguita A. Proca Lukas Braun D. Kunin P. Mediano Andrew M. Saxe 92 5 0 22 Sep 2024
The Two Regimes of Deep Network Training Guillaume Leclerc Aleksander Madry 63 45 0 24 Feb 2020
The Early Phase of Neural Network Training Jonathan Frankle D. Schwab Ari S. Morcos 87 172 0 24 Feb 2020
The Break-Even Point on Optimization Trajectories of Deep Neural Networks Stanislaw Jastrzebski Maciej Szymczak Stanislav Fort Devansh Arpit Jacek Tabor Kyunghyun Cho Krzysztof J. Geras 78 161 0 21 Feb 2020
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima Zeke Xie Issei Sato Masashi Sugiyama ODL 56 17 0 10 Feb 2020
Disentangling Trainability and Generalization in Deep Neural Networks Lechao Xiao Jeffrey Pennington S. Schoenholz 57 34 0 30 Dec 2019
Neural Tangents: Fast and Easy Infinite Neural Networks in Python Roman Novak Lechao Xiao Jiri Hron Jaehoon Lee Alexander A. Alemi Jascha Narain Sohl-Dickstein S. Schoenholz 70 228 0 05 Dec 2019
Fantastic Generalization Measures and Where to Find Them Yiding Jiang Behnam Neyshabur H. Mobahi Dilip Krishnan Samy Bengio AI4CE 134 606 0 04 Dec 2019
Asymptotics of Wide Networks from Feynman Diagrams Ethan Dyer Guy Gur-Ari 71 114 0 25 Sep 2019
Dynamics of Deep Neural Networks and Neural Tangent Hierarchy Jiaoyang Huang H. Yau 52 150 0 18 Sep 2019
Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks Yuanzhi Li Colin Wei Tengyu Ma 47 299 0 10 Jul 2019
Kernel and Rich Regimes in Overparametrized Models Blake E. Woodworth Suriya Gunasekar Pedro H. P. Savarese E. Moroshko Itay Golan Jason D. Lee Daniel Soudry Nathan Srebro 77 363 0 13 Jun 2019
The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study Daniel S. Park Jascha Narain Sohl-Dickstein Quoc V. Le Samuel L. Smith 81 57 0 09 May 2019
On Exact Computation with an Infinitely Wide Neural Net Sanjeev Arora S. Du Wei Hu Zhiyuan Li Ruslan Salakhutdinov Ruosong Wang 220 923 0 26 Apr 2019
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent Jaehoon Lee Lechao Xiao S. Schoenholz Yasaman Bahri Roman Novak Jascha Narain Sohl-Dickstein Jeffrey Pennington 211 1,101 0 18 Feb 2019
On Lazy Training in Differentiable Programming Lénaïc Chizat Edouard Oyallon Francis R. Bach 111 833 0 19 Dec 2018
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks Difan Zou Yuan Cao Dongruo Zhou Quanquan Gu ODL 185 448 0 21 Nov 2018
A Convergence Theory for Deep Learning via Over-Parameterization Zeyuan Allen-Zhu Yuanzhi Li Zhao Song AI4CE ODL 247 1,462 0 09 Nov 2018
Gradient Descent Finds Global Minima of Deep Neural Networks S. Du Jason D. Lee Haochuan Li Liwei Wang Masayoshi Tomizuka ODL 194 1,135 0 09 Nov 2018
Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes Roman Novak Lechao Xiao Jaehoon Lee Yasaman Bahri Greg Yang Jiri Hron Daniel A. Abolafia Jeffrey Pennington Jascha Narain Sohl-Dickstein UQCV BDL 63 309 0 11 Oct 2018
Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data Yuanzhi Li Yingyu Liang MLT 216 653 0 03 Aug 2018
Stochastic natural gradient descent draws posterior samples in function space Samuel L. Smith Daniel Duckworth Semon Rezchikov Quoc V. Le Jascha Narain Sohl-Dickstein BDL 43 6 0 25 Jun 2018
Neural Tangent Kernel: Convergence and Generalization in Neural Networks Arthur Jacot Franck Gabriel Clément Hongler 267 3,195 0 20 Jun 2018
A Mean Field View of the Landscape of Two-Layers Neural Networks Song Mei Andrea Montanari Phan-Minh Nguyen MLT 86 858 0 18 Apr 2018
Don't Decay the Learning Rate, Increase the Batch Size Samuel L. Smith Pieter-Jan Kindermans Chris Ying Quoc V. Le ODL 99 995 0 01 Nov 2017
Deep Neural Networks as Gaussian Processes Jaehoon Lee Yasaman Bahri Roman Novak S. Schoenholz Jeffrey Pennington Jascha Narain Sohl-Dickstein UQCV BDL 118 1,093 0 01 Nov 2017
A Bayesian Perspective on Generalization and Stochastic Gradient Descent Samuel L. Smith Quoc V. Le BDL 61 251 0 17 Oct 2017
Stochastic Gradient Descent as Approximate Bayesian Inference Stephan Mandt Matthew D. Hoffman David M. Blei BDL 55 597 0 13 Apr 2017
Sharp Minima Can Generalize For Deep Nets Laurent Dinh Razvan Pascanu Samy Bengio Yoshua Bengio ODL 114 772 0 15 Mar 2017
SGD Learns the Conjugate Kernel Class of the Network Amit Daniely 186 181 0 27 Feb 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 424 2,937 0 15 Sep 2016
Wide Residual Networks Sergey Zagoruyko N. Komodakis 337 7,984 0 23 May 2016