Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2207.12678
Cited By
v1
v2 (latest)
Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability
26 July 2022
Z. Li
Zixuan Wang
Jian Li
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability"
29 / 29 papers shown
Title
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
Ruiqi Zhang
Jingfeng Wu
Licong Lin
Peter L. Bartlett
73
2
0
05 Apr 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Dayal Singh Kalra
Tianyu He
M. Barkeshli
121
7
0
17 Feb 2025
Improving Multi-task Learning via Seeking Task-based Flat Regions
Hoang Phan
Lam C. Tran
Ngoc N. Tran
Nhat Ho
Tuan Truong
Qi Lei
Nhat Ho
Dinh Q. Phung
Trung Le
205
11
0
24 Nov 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
83
75
0
14 Jun 2022
Understanding Gradient Descent on Edge of Stability in Deep Learning
Sanjeev Arora
Zhiyuan Li
A. Panigrahi
MLT
110
99
0
19 May 2022
Understanding the unstable convergence of gradient descent
Kwangjun Ahn
J.N. Zhang
S. Sra
82
63
0
03 Apr 2022
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
Zhiyuan Li
Tianhao Wang
Sanjeev Arora
MLT
113
105
0
13 Oct 2021
Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability
Jeremy M. Cohen
Simran Kaur
Yuanzhi Li
J. Zico Kolter
Ameet Talwalkar
ODL
104
277
0
26 Feb 2021
Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks
Yikai Wu
Xingyu Zhu
Chenwei Wu
Annie Wang
Rong Ge
105
45
0
08 Oct 2020
Sharpness-Aware Minimization for Efficiently Improving Generalization
Pierre Foret
Ariel Kleiner
H. Mobahi
Behnam Neyshabur
AAML
199
1,358
0
03 Oct 2020
The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks
Wei Hu
Lechao Xiao
Ben Adlam
Jeffrey Pennington
64
63
0
25 Jun 2020
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
202
241
0
04 Mar 2020
The Break-Even Point on Optimization Trajectories of Deep Neural Networks
Stanislaw Jastrzebski
Maciej Szymczak
Stanislav Fort
Devansh Arpit
Jacek Tabor
Kyunghyun Cho
Krzysztof J. Geras
85
163
0
21 Feb 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
547
42,639
0
03 Dec 2019
Pathological spectra of the Fisher information metric and its variants in deep neural networks
Ryo Karakida
S. Akaho
S. Amari
63
28
0
14 Oct 2019
Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks
Ziwei Ji
Matus Telgarsky
72
178
0
26 Sep 2019
The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks
Ryo Karakida
S. Akaho
S. Amari
63
41
0
07 Jun 2019
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
Jaehoon Lee
Lechao Xiao
S. Schoenholz
Yasaman Bahri
Roman Novak
Jascha Narain Sohl-Dickstein
Jeffrey Pennington
213
1,108
0
18 Feb 2019
Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks
Sanjeev Arora
S. Du
Wei Hu
Zhiyuan Li
Ruosong Wang
MLT
208
974
0
24 Jan 2019
Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians
Vardan Papyan
69
88
0
24 Jan 2019
On Lazy Training in Differentiable Programming
Lénaïc Chizat
Edouard Oyallon
Francis R. Bach
111
840
0
19 Dec 2018
Gradient Descent Finds Global Minima of Deep Neural Networks
S. Du
Jason D. Lee
Haochuan Li
Liwei Wang
Masayoshi Tomizuka
ODL
229
1,136
0
09 Nov 2018
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
S. Du
Xiyu Zhai
Barnabás Póczós
Aarti Singh
MLT
ODL
233
1,276
0
04 Oct 2018
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Arthur Jacot
Franck Gabriel
Clément Hongler
273
3,223
0
20 Jun 2018
A Walk with SGD
Chen Xing
Devansh Arpit
Christos Tsirigotis
Yoshua Bengio
94
119
0
24 Feb 2018
Empirical Analysis of the Hessian of Over-Parametrized Neural Networks
Levent Sagun
Utku Evci
V. U. Güney
Yann N. Dauphin
Léon Bottou
56
418
0
14 Jun 2017
Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond
Levent Sagun
Léon Bottou
Yann LeCun
UQCV
95
236
0
22 Nov 2016
Optimization Methods for Large-Scale Machine Learning
Léon Bottou
Frank E. Curtis
J. Nocedal
252
3,225
0
15 Jun 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
194,510
0
10 Dec 2015
1