Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate

6 October 2020

Papers citing "Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate"

21 / 21 papers shown

Title
Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries Chris Kolb T. Weber Bernd Bischl David Rügamer 120 0 0 04 Feb 2025
Normalization and effective learning rates in reinforcement learning Clare Lyle Zeyu Zheng Khimya Khetarpal James Martens H. V. Hasselt Razvan Pascanu Will Dabney 26 7 0 01 Jul 2024
How to set AdamW's weight decay as you scale model and dataset size Xi Wang Laurence Aitchison 51 10 0 22 May 2024
$Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization$ Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization Shuo Xie Zhiyuan Li OffRL 55 13 0 05 Apr 2024
Directional Smoothness and Gradient Methods: Convergence and Adaptivity Aaron Mishkin Ahmed Khaled Yuanhao Wang Aaron Defazio Robert Mansel Gower 44 6 0 06 Mar 2024
Analyzing and Improving the Training Dynamics of Diffusion Models Tero Karras M. Aittala J. Lehtinen Janne Hellsten Timo Aila S. Laine 61 158 0 05 Dec 2023
Large Learning Rates Improve Generalization: But How Large Are We Talking About? E. Lobacheva Eduard Pockonechnyy M. Kodryan Dmitry Vetrov AI4CE 16 0 0 19 Nov 2023
A Modern Look at the Relationship between Sharpness and Generalization Maksym Andriushchenko Francesco Croce Maximilian Müller Matthias Hein Nicolas Flammarion 3DH 29 56 0 14 Feb 2023
An SDE for Modeling SAM: Theory and Insights Enea Monzio Compagnoni Luca Biggio Antonio Orvieto F. Proske Hans Kersting Aurelien Lucchi 35 13 0 19 Jan 2023
Toward Equation of Motion for Deep Neural Networks: Continuous-time Gradient Descent and Discretization Error Analysis Taiki Miyagawa 55 9 0 28 Oct 2022
SGD with Large Step Sizes Learns Sparse Features Maksym Andriushchenko Aditya Varre Loucas Pillaud-Vivien Nicolas Flammarion 50 56 0 11 Oct 2022
Adapting the Linearised Laplace Model Evidence for Modern Deep Learning Javier Antorán David Janz J. Allingham Erik A. Daxberger Riccardo Barbano Eric T. Nalisnick José Miguel Hernández-Lobato UQCV BDL 37 28 0 17 Jun 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction Kaifeng Lyu Zhiyuan Li Sanjeev Arora FAtt 52 71 0 14 Jun 2022
Robust Training of Neural Networks Using Scale Invariant Architectures Zhiyuan Li Srinadh Bhojanapalli Manzil Zaheer Sashank J. Reddi Surinder Kumar 29 27 0 02 Feb 2022
Stochastic Training is Not Necessary for Generalization Jonas Geiping Micah Goldblum Phillip E. Pope Michael Moeller Tom Goldstein 91 72 0 29 Sep 2021
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion D. Kunin Javier Sagastuy-Breña Lauren Gillespie Eshed Margalit Hidenori Tanaka Surya Ganguli Daniel L. K. Yamins 36 16 0 19 Jul 2021
How to decay your learning rate Aitor Lewkowycz 51 24 0 23 Mar 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs) Zhiyuan Li Sadhika Malladi Sanjeev Arora 49 78 0 24 Feb 2021
Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics D. Kunin Javier Sagastuy-Breña Surya Ganguli Daniel L. K. Yamins Hidenori Tanaka 107 77 0 08 Dec 2020
On the training dynamics of deep networks with $L_2$ regularization Aitor Lewkowycz Guy Gur-Ari 44 53 0 15 Jun 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 312 2,896 0 15 Sep 2016