Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1807.05031
Cited By
v1
v2
v3
v4
v5
v6 (latest)
On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length
13 July 2018
Stanislaw Jastrzebski
Zachary Kenton
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length"
45 / 45 papers shown
Title
Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More
Geonhui Yoo
Minhak Song
Chulhee Yun
FAtt
22
0
0
07 Jun 2025
SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation
Dahun Shin
Dongyeop Lee
Jinseok Chung
Namhoon Lee
ODL
AAML
510
0
0
25 Feb 2025
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
118
7
1
25 May 2024
Visualizing, Rethinking, and Mining the Loss Landscape of Deep Neural Networks
Yichu Xu
Xin-Chun Li
Lan Li
De-Chuan Zhan
87
2
0
21 May 2024
Training-time Neuron Alignment through Permutation Subspace for Improving Linear Mode Connectivity and Model Fusion
Zexi Li
Zhiqi Li
Jie Lin
Tao Shen
Tao Lin
Chao Wu
126
5
0
02 Feb 2024
Stochastic Modified Equations and Dynamics of Dropout Algorithm
Zhongwang Zhang
Yuqing Li
Yaoyu Zhang
Z. Xu
45
9
0
25 May 2023
Evolutionary Algorithms in the Light of SGD: Limit Equivalence, Minima Flatness, and Transfer Learning
Andrei Kucharavy
R. Guerraoui
Ljiljana Dolamic
104
1
0
20 May 2023
Revisiting Weighted Aggregation in Federated Learning with Neural Networks
Zexi Li
Tao R. Lin
Xinyi Shang
Chao-Xiang Wu
FedML
102
65
0
14 Feb 2023
Communication-Efficient Federated Learning for Heterogeneous Edge Devices Based on Adaptive Gradient Quantization
Heting Liu
Fang He
Guohong Cao
FedML
MQ
82
27
0
16 Dec 2022
Learning threshold neurons via the "edge of stability"
Kwangjun Ahn
Sébastien Bubeck
Sinho Chewi
Y. Lee
Felipe Suarez
Yi Zhang
MLT
103
41
0
14 Dec 2022
Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability
Alexandru Damian
Eshaan Nichani
Jason D. Lee
107
88
0
30 Sep 2022
Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes
M. Kodryan
E. Lobacheva
M. Nakhodnov
Dmitry Vetrov
105
17
0
08 Sep 2022
Adaptive Gradient Methods at the Edge of Stability
Jeremy M. Cohen
Behrooz Ghorbani
Shankar Krishnan
Naman Agarwal
Sourabh Medapati
...
Daniel Suo
David E. Cardoze
Zachary Nado
George E. Dahl
Justin Gilmer
ODL
103
54
0
29 Jul 2022
Dataset Condensation via Efficient Synthetic-Data Parameterization
Jang-Hyun Kim
Jinuk Kim
Seong Joon Oh
Sangdoo Yun
Hwanjun Song
Joonhyun Jeong
Jung-Woo Ha
Hyun Oh Song
DD
518
168
0
30 May 2022
Understanding the unstable convergence of gradient descent
Kwangjun Ahn
J.N. Zhang
S. Sra
88
63
0
03 Apr 2022
It's All In the Teacher: Zero-Shot Quantization Brought Closer to the Teacher
Kanghyun Choi
Hye Yoon Lee
Deokki Hong
Joonsang Yu
Noseong Park
Youngsok Kim
Jinho Lee
MQ
101
33
0
31 Mar 2022
Complexity from Adaptive-Symmetries Breaking: Global Minima in the Statistical Mechanics of Deep Neural Networks
Shaun Li
AI4CE
71
0
0
03 Jan 2022
An Empirical Investigation of the Role of Pre-training in Lifelong Learning
Sanket Vaibhav Mehta
Darshan Patil
Sarath Chandar
Emma Strubell
CLL
148
145
0
16 Dec 2021
Critical Learning Periods in Federated Learning
Gang Yan
Hao Wang
Jian Li
FedML
82
8
0
12 Sep 2021
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion
D. Kunin
Javier Sagastuy-Breña
Lauren Gillespie
Eshed Margalit
Hidenori Tanaka
Surya Ganguli
Daniel L. K. Yamins
93
20
0
19 Jul 2021
Regularization in ResNet with Stochastic Depth
Soufiane Hayou
Fadhel Ayed
57
10
0
06 Jun 2021
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
Xiangning Chen
Cho-Jui Hsieh
Boqing Gong
ViT
108
330
0
03 Jun 2021
Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability
Jeremy M. Cohen
Simran Kaur
Yuanzhi Li
J. Zico Kolter
Ameet Talwalkar
ODL
131
279
0
26 Feb 2021
Consensus Control for Decentralized Deep Learning
Lingjing Kong
Tao R. Lin
Anastasia Koloskova
Martin Jaggi
Sebastian U. Stich
53
79
0
09 Feb 2021
Combating Mode Collapse in GAN training: An Empirical Analysis using Hessian Eigenvalues
Ricard Durall
Avraam Chatzimichailidis
P. Labus
J. Keuper
GAN
77
62
0
17 Dec 2020
Geometry Perspective Of Estimating Learning Capability Of Neural Networks
Ankan Dutta
Arnab Rakshit
36
1
0
03 Nov 2020
Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification
Saurabh Agarwal
Hongyi Wang
Kangwook Lee
Shivaram Venkataraman
Dimitris Papailiopoulos
85
25
0
29 Oct 2020
Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks
Yikai Wu
Xingyu Zhu
Chenwei Wu
Annie Wang
Rong Ge
114
45
0
08 Oct 2020
Traces of Class/Cross-Class Structure Pervade Deep Learning Spectra
Vardan Papyan
64
80
0
27 Aug 2020
Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training
Diego Granziol
S. Zohren
Stephen J. Roberts
ODL
148
50
0
16 Jun 2020
Understanding the Role of Training Regimes in Continual Learning
Seyed Iman Mirzadeh
Mehrdad Farajtabar
Razvan Pascanu
H. Ghasemzadeh
CLL
81
228
0
12 Jun 2020
Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems
Preetum Nakkiran
MLT
64
21
0
15 May 2020
Analyzing the discrepancy principle for kernelized spectral filter learning algorithms
Alain Celisse
Martin Wahl
65
18
0
17 Apr 2020
On Learning Rates and Schrödinger Operators
Bin Shi
Weijie J. Su
Michael I. Jordan
90
61
0
15 Apr 2020
Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule
Nikhil Iyer
V. Thejas
Nipun Kwatra
Ramachandran Ramjee
Muthian Sivathanu
75
29
0
09 Mar 2020
The Implicit and Explicit Regularization Effects of Dropout
Colin Wei
Sham Kakade
Tengyu Ma
116
118
0
28 Feb 2020
The Break-Even Point on Optimization Trajectories of Deep Neural Networks
Stanislaw Jastrzebski
Maciej Szymczak
Stanislav Fort
Devansh Arpit
Jacek Tabor
Kyunghyun Cho
Krzysztof J. Geras
88
164
0
21 Feb 2020
Emergent properties of the local geometry of neural loss landscapes
Stanislav Fort
Surya Ganguli
120
51
0
14 Oct 2019
Improved Sample Complexities for Deep Networks and Robust Classification via an All-Layer Margin
Colin Wei
Tengyu Ma
AAML
OOD
72
85
0
09 Oct 2019
GradVis: Visualization and Second Order Analysis of Optimization Surfaces during the Training of Deep Neural Networks
Avraam Chatzimichailidis
Franz-Josef Pfreundt
N. Gauger
J. Keuper
52
10
0
26 Sep 2019
Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization
Xinyan Li
Qilong Gu
Yingxue Zhou
Tiancong Chen
A. Banerjee
ODL
88
52
0
24 Jul 2019
Large Scale Structure of Neural Network Loss Landscapes
Stanislav Fort
Stanislaw Jastrzebski
74
84
0
11 Jun 2019
Stiffness: A New Perspective on Generalization in Neural Networks
Stanislav Fort
Pawel Krzysztof Nowak
Stanislaw Jastrzebski
S. Narayanan
133
94
0
28 Jan 2019
Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians
Vardan Papyan
82
88
0
24 Jan 2019
Laplacian Smoothing Gradient Descent
Stanley Osher
Bao Wang
Penghang Yin
Xiyang Luo
Farzin Barekat
Minh Pham
A. Lin
ODL
113
43
0
17 Jun 2018
1