Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.02916
Cited By
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate
6 October 2020
Zhiyuan Li
Kaifeng Lyu
Sanjeev Arora
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate"
21 / 21 papers shown
Title
Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries
Chris Kolb
T. Weber
Bernd Bischl
David Rügamer
120
0
0
04 Feb 2025
Normalization and effective learning rates in reinforcement learning
Clare Lyle
Zeyu Zheng
Khimya Khetarpal
James Martens
H. V. Hasselt
Razvan Pascanu
Will Dabney
26
7
0
01 Jul 2024
How to set AdamW's weight decay as you scale model and dataset size
Xi Wang
Laurence Aitchison
51
10
0
22 May 2024
Implicit Bias of AdamW:
ℓ
∞
\ell_\infty
ℓ
∞
Norm Constrained Optimization
Shuo Xie
Zhiyuan Li
OffRL
55
13
0
05 Apr 2024
Directional Smoothness and Gradient Methods: Convergence and Adaptivity
Aaron Mishkin
Ahmed Khaled
Yuanhao Wang
Aaron Defazio
Robert Mansel Gower
44
6
0
06 Mar 2024
Analyzing and Improving the Training Dynamics of Diffusion Models
Tero Karras
M. Aittala
J. Lehtinen
Janne Hellsten
Timo Aila
S. Laine
61
158
0
05 Dec 2023
Large Learning Rates Improve Generalization: But How Large Are We Talking About?
E. Lobacheva
Eduard Pockonechnyy
M. Kodryan
Dmitry Vetrov
AI4CE
16
0
0
19 Nov 2023
A Modern Look at the Relationship between Sharpness and Generalization
Maksym Andriushchenko
Francesco Croce
Maximilian Müller
Matthias Hein
Nicolas Flammarion
3DH
29
56
0
14 Feb 2023
An SDE for Modeling SAM: Theory and Insights
Enea Monzio Compagnoni
Luca Biggio
Antonio Orvieto
F. Proske
Hans Kersting
Aurelien Lucchi
35
13
0
19 Jan 2023
Toward Equation of Motion for Deep Neural Networks: Continuous-time Gradient Descent and Discretization Error Analysis
Taiki Miyagawa
55
9
0
28 Oct 2022
SGD with Large Step Sizes Learns Sparse Features
Maksym Andriushchenko
Aditya Varre
Loucas Pillaud-Vivien
Nicolas Flammarion
50
56
0
11 Oct 2022
Adapting the Linearised Laplace Model Evidence for Modern Deep Learning
Javier Antorán
David Janz
J. Allingham
Erik A. Daxberger
Riccardo Barbano
Eric T. Nalisnick
José Miguel Hernández-Lobato
UQCV
BDL
37
28
0
17 Jun 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
52
71
0
14 Jun 2022
Robust Training of Neural Networks Using Scale Invariant Architectures
Zhiyuan Li
Srinadh Bhojanapalli
Manzil Zaheer
Sashank J. Reddi
Surinder Kumar
29
27
0
02 Feb 2022
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
91
72
0
29 Sep 2021
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion
D. Kunin
Javier Sagastuy-Breña
Lauren Gillespie
Eshed Margalit
Hidenori Tanaka
Surya Ganguli
Daniel L. K. Yamins
36
16
0
19 Jul 2021
How to decay your learning rate
Aitor Lewkowycz
51
24
0
23 Mar 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
Zhiyuan Li
Sadhika Malladi
Sanjeev Arora
49
78
0
24 Feb 2021
Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics
D. Kunin
Javier Sagastuy-Breña
Surya Ganguli
Daniel L. K. Yamins
Hidenori Tanaka
107
77
0
08 Dec 2020
On the training dynamics of deep networks with
L
2
L_2
L
2
regularization
Aitor Lewkowycz
Guy Gur-Ari
44
53
0
15 Jun 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
312
2,896
0
15 Sep 2016
1