Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00065
Cited By
Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability
26 February 2021
Jeremy M. Cohen
Simran Kaur
Yuanzhi Li
J. Zico Kolter
Ameet Talwalkar
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability"
50 / 77 papers shown
Title
Contextures: Representations from Contexts
Runtian Zhai
Kai Yang
Che-Ping Tsai
Burak Varici
Zico Kolter
Pradeep Ravikumar
143
0
0
02 May 2025
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
Ruiqi Zhang
Jingfeng Wu
Licong Lin
Peter L. Bartlett
30
0
0
05 Apr 2025
Feature Learning Beyond the Edge of Stability
Dávid Terjék
MLT
46
0
0
18 Feb 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Dayal Singh Kalra
Tianyu He
M. Barkeshli
54
4
0
17 Feb 2025
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
Pierfrancesco Beneventano
Blake Woodworth
MLT
39
1
0
15 Jan 2025
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
Tianjin Huang
Ziquan Zhu
Gaojie Jin
Lu Liu
Zhangyang Wang
Shiwei Liu
47
1
0
12 Jan 2025
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
Jim Zhao
Sidak Pal Singh
Aurelien Lucchi
AI4CE
48
0
0
04 Nov 2024
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Zhanpeng Zhou
Mingze Wang
Yuchen Mao
Bingrui Li
Junchi Yan
AAML
62
0
0
14 Oct 2024
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec
Felix Dangel
Sidak Pal Singh
38
7
0
14 Oct 2024
A Parameter Update Balancing Algorithm for Multi-task Ranking Models in Recommendation Systems
Jun Yuan
Guohao Cai
Zhenhua Dong
23
0
0
08 Oct 2024
Residual Kolmogorov-Arnold Network for Enhanced Deep Learning
Ray Congrui Yu
Sherry Wu
Jiang Gui
44
1
0
07 Oct 2024
The Optimization Landscape of SGD Across the Feature Learning Strength
Alexander B. Atanasov
Alexandru Meterez
James B. Simon
Cengiz Pehlevan
43
2
0
06 Oct 2024
Can Optimization Trajectories Explain Multi-Task Transfer?
David Mueller
Mark Dredze
Nicholas Andrews
61
1
0
26 Aug 2024
Information-Theoretic Progress Measures reveal Grokking is an Emergent Phase Transition
Kenzo Clauw
S. Stramaglia
Daniele Marinazzo
50
3
0
16 Aug 2024
How DNNs break the Curse of Dimensionality: Compositionality and Symmetry Learning
Arthur Jacot
Seok Hoan Choi
Yuxiao Wen
AI4CE
94
2
0
08 Jul 2024
Normalization and effective learning rates in reinforcement learning
Clare Lyle
Zeyu Zheng
Khimya Khetarpal
James Martens
H. V. Hasselt
Razvan Pascanu
Will Dabney
19
7
0
01 Jul 2024
The Impact of Initialization on LoRA Finetuning Dynamics
Soufiane Hayou
Nikhil Ghosh
Bin Yu
AI4CE
36
13
0
12 Jun 2024
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
39
3
0
29 May 2024
Fast TRAC: A Parameter-Free Optimizer for Lifelong Reinforcement Learning
Aneesh Muppidi
Zhiyu Zhang
Heng Yang
34
4
0
26 May 2024
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
71
5
1
25 May 2024
Why is SAM Robust to Label Noise?
Christina Baek
Zico Kolter
Aditi Raghunathan
NoLa
AAML
43
9
0
06 May 2024
Inexact subgradient methods for semialgebraic functions
Jérôme Bolte
Tam Le
Éric Moulines
Edouard Pauwels
60
2
0
30 Apr 2024
High dimensional analysis reveals conservative sharpening and a stochastic edge of stability
Atish Agarwala
Jeffrey Pennington
41
3
0
30 Apr 2024
Directional Smoothness and Gradient Methods: Convergence and Adaptivity
Aaron Mishkin
Ahmed Khaled
Yuanhao Wang
Aaron Defazio
Robert Mansel Gower
44
6
0
06 Mar 2024
LoRA+: Efficient Low Rank Adaptation of Large Models
Soufiane Hayou
Nikhil Ghosh
Bin Yu
AI4CE
37
148
0
19 Feb 2024
GD doesn't make the cut: Three ways that non-differentiability affects neural network training
Siddharth Krishna Kumar
AAML
18
2
0
16 Jan 2024
Directions of Curvature as an Explanation for Loss of Plasticity
Alex Lewandowski
Haruto Tanaka
Dale Schuurmans
Marlos C. Machado
19
5
0
30 Nov 2023
Spectral alignment of stochastic gradient descent for high-dimensional classification tasks
Gerard Ben Arous
Reza Gheissari
Jiaoyang Huang
Aukosh Jagannath
35
14
0
04 Oct 2023
From Stability to Chaos: Analyzing Gradient Descent Dynamics in Quadratic Regression
Xuxing Chen
Krishnakumar Balasubramanian
Promit Ghosal
Bhavya Agrawalla
36
7
0
02 Oct 2023
Small-scale proxies for large-scale Transformer training instabilities
Mitchell Wortsman
Peter J. Liu
Lechao Xiao
Katie Everett
A. Alemi
...
Jascha Narain Sohl-Dickstein
Kelvin Xu
Jaehoon Lee
Justin Gilmer
Simon Kornblith
40
84
0
25 Sep 2023
Sharpness-Aware Minimization and the Edge of Stability
Philip M. Long
Peter L. Bartlett
AAML
27
9
0
21 Sep 2023
Practical Sharpness-Aware Minimization Cannot Converge All the Way to Optima
Dongkuk Si
Chulhee Yun
28
15
0
16 Jun 2023
Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff
Arthur Jacot
MLT
26
13
0
30 May 2023
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond
Itai Kreisler
Mor Shpigel Nacson
Daniel Soudry
Y. Carmon
33
13
0
22 May 2023
Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability
Jingfeng Wu
Vladimir Braverman
Jason D. Lee
32
17
0
19 May 2023
Convergence of Alternating Gradient Descent for Matrix Factorization
R. Ward
T. Kolda
22
6
0
11 May 2023
Learning Trajectories are Generalization Indicators
Jingwen Fu
Zhizheng Zhang
Dacheng Yin
Yan Lu
Nanning Zheng
AI4CE
33
3
0
25 Apr 2023
Automatic Gradient Descent: Deep Learning without Hyperparameters
Jeremy Bernstein
Chris Mingard
Kevin Huang
Navid Azizan
Yisong Yue
ODL
16
17
0
11 Apr 2023
Dynamics of Finite Width Kernel and Prediction Fluctuations in Mean Field Neural Networks
Blake Bordelon
Cengiz Pehlevan
MLT
38
29
0
06 Apr 2023
Understanding plasticity in neural networks
Clare Lyle
Zeyu Zheng
Evgenii Nikishin
Bernardo Avila-Pires
Razvan Pascanu
Will Dabney
AI4CE
35
97
0
02 Mar 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
Kayhan Behdin
Qingquan Song
Aman Gupta
S. Keerthi
Ayan Acharya
Borja Ocejo
Gregory Dexter
Rajiv Khanna
D. Durfee
Rahul Mazumder
AAML
18
7
0
19 Feb 2023
Improving Training Stability for Multitask Ranking Models in Recommender Systems
Jiaxi Tang
Yoel Drori
Daryl Chang
M. Sathiamoorthy
Justin Gilmer
Li Wei
Xinyang Yi
Lichan Hong
Ed H. Chi
32
10
0
17 Feb 2023
SAM operates far from home: eigenvalue regularization as a dynamical phenomenon
Atish Agarwala
Yann N. Dauphin
21
20
0
17 Feb 2023
A Modern Look at the Relationship between Sharpness and Generalization
Maksym Andriushchenko
Francesco Croce
Maximilian Müller
Matthias Hein
Nicolas Flammarion
3DH
19
55
0
14 Feb 2023
On a continuous time model of gradient descent dynamics and instability in deep learning
Mihaela Rosca
Yan Wu
Chongli Qin
Benoit Dherin
18
7
0
03 Feb 2023
Implicit regularization in Heavy-ball momentum accelerated stochastic gradient descent
Avrajit Ghosh
He Lyu
Xitong Zhang
Rongrong Wang
53
21
0
02 Feb 2023
Catapult Dynamics and Phase Transitions in Quadratic Nets
David Meltzer
Junyu Liu
27
9
0
18 Jan 2023
Training trajectories, mini-batch losses and the curious role of the learning rate
Mark Sandler
A. Zhmoginov
Max Vladymyrov
Nolan Miller
ODL
28
10
0
05 Jan 2023
Learning threshold neurons via the "edge of stability"
Kwangjun Ahn
Sébastien Bubeck
Sinho Chewi
Y. Lee
Felipe Suarez
Yi Zhang
MLT
38
36
0
14 Dec 2022
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States
Ziqiao Wang
Yongyi Mao
30
10
0
19 Nov 2022
1
2
Next