Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.06914
Cited By
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
13 October 2021
Zhiyuan Li
Tianhao Wang
Sanjeev Arora
MLT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"What Happens after SGD Reaches Zero Loss? --A Mathematical Framework"
32 / 32 papers shown
Title
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Zhanpeng Zhou
Mingze Wang
Yuchen Mao
Bingrui Li
Junchi Yan
AAML
59
0
0
14 Oct 2024
Nesterov acceleration in benignly non-convex landscapes
Kanan Gupta
Stephan Wojtowytsch
36
2
0
10 Oct 2024
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD
Pierfrancesco Beneventano
Andrea Pinto
Tomaso A. Poggio
MLT
27
1
0
17 Jun 2024
Reparameterization invariance in approximate Bayesian inference
Hrittik Roy
M. Miani
Carl Henrik Ek
Philipp Hennig
Marvin Pfortner
Lukas Tatzel
Søren Hauberg
BDL
42
8
0
05 Jun 2024
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
61
4
1
25 May 2024
Implicit Bias of AdamW:
ℓ
∞
\ell_\infty
ℓ
∞
Norm Constrained Optimization
Shuo Xie
Zhiyuan Li
OffRL
35
12
0
05 Apr 2024
Which Frequencies do CNNs Need? Emergent Bottleneck Structure in Feature Learning
Yuxiao Wen
Arthur Jacot
47
6
0
12 Feb 2024
Stochastic Modified Flows for Riemannian Stochastic Gradient Descent
Benjamin Gess
Sebastian Kassing
Nimit Rana
34
0
0
02 Feb 2024
A Coefficient Makes SVRG Effective
Yida Yin
Zhiqiu Xu
Zhiyuan Li
Trevor Darrell
Zhuang Liu
25
1
0
09 Nov 2023
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
Kaiyue Wen
Zhiyuan Li
Tengyu Ma
FAtt
36
26
0
20 Jul 2023
How to escape sharp minima with random perturbations
Kwangjun Ahn
Ali Jadbabaie
S. Sra
ODL
26
6
0
25 May 2023
Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models
Alexandru Damian
Eshaan Nichani
Rong Ge
Jason D. Lee
MLT
34
33
0
18 May 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
Kayhan Behdin
Qingquan Song
Aman Gupta
S. Keerthi
Ayan Acharya
Borja Ocejo
Gregory Dexter
Rajiv Khanna
D. Durfee
Rahul Mazumder
AAML
13
7
0
19 Feb 2023
The Geometry of Neural Nets' Parameter Spaces Under Reparametrization
Agustinus Kristiadi
Felix Dangel
Philipp Hennig
22
11
0
14 Feb 2023
Stochastic Modified Flows, Mean-Field Limits and Dynamics of Stochastic Gradient Descent
Benjamin Gess
Sebastian Kassing
Vitalii Konarovskyi
DiffM
24
6
0
14 Feb 2023
On the Lipschitz Constant of Deep Networks and Double Descent
Matteo Gamba
Hossein Azizpour
Marten Bjorkman
19
7
0
28 Jan 2023
Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing
Jikai Jin
Zhiyuan Li
Kaifeng Lyu
S. Du
Jason D. Lee
MLT
46
34
0
27 Jan 2023
How Does Sharpness-Aware Minimization Minimize Sharpness?
Kaiyue Wen
Tengyu Ma
Zhiyuan Li
AAML
21
47
0
10 Nov 2022
Toward Equation of Motion for Deep Neural Networks: Continuous-time Gradient Descent and Discretization Error Analysis
Taiki Miyagawa
35
9
0
28 Oct 2022
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models
Hong Liu
Sang Michael Xie
Zhiyuan Li
Tengyu Ma
AI4CE
32
49
0
25 Oct 2022
Noise Injection as a Probe of Deep Learning Dynamics
Noam Levi
I. Bloch
M. Freytsis
T. Volansky
32
2
0
24 Oct 2022
Deep Double Descent via Smooth Interpolation
Matteo Gamba
Erik Englesson
Marten Bjorkman
Hossein Azizpour
53
10
0
21 Sep 2022
On the Implicit Bias in Deep-Learning Algorithms
Gal Vardi
FedML
AI4CE
30
72
0
26 Aug 2022
Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability
Z. Li
Zixuan Wang
Jian Li
19
42
0
26 Jul 2022
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent
Zhiyuan Li
Tianhao Wang
Jason D. Lee
Sanjeev Arora
32
27
0
08 Jul 2022
Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation
Loucas Pillaud-Vivien
J. Reygner
Nicolas Flammarion
NoLa
31
31
0
20 Jun 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
35
69
0
14 Jun 2022
Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes
Chao Ma
D. Kunin
Lei Wu
Lexing Ying
25
27
0
24 Apr 2022
Anticorrelated Noise Injection for Improved Generalization
Antonio Orvieto
Hans Kersting
F. Proske
Francis R. Bach
Aurélien Lucchi
53
44
0
06 Feb 2022
Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks
Noam Razin
Asaf Maman
Nadav Cohen
33
29
0
27 Jan 2022
First-order Methods Almost Always Avoid Saddle Points
J. Lee
Ioannis Panageas
Georgios Piliouras
Max Simchowitz
Michael I. Jordan
Benjamin Recht
ODL
87
82
0
20 Oct 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
278
2,888
0
15 Sep 2016
1