Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2002.09572
Cited By
The Break-Even Point on Optimization Trajectories of Deep Neural Networks
21 February 2020
Stanislaw Jastrzebski
Maciej Szymczak
Stanislav Fort
Devansh Arpit
Jacek Tabor
Kyunghyun Cho
Krzysztof J. Geras
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Break-Even Point on Optimization Trajectories of Deep Neural Networks"
38 / 38 papers shown
Title
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Dayal Singh Kalra
Tianyu He
M. Barkeshli
49
4
0
17 Feb 2025
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
Pierfrancesco Beneventano
Blake Woodworth
MLT
36
1
0
15 Jan 2025
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Zhanpeng Zhou
Mingze Wang
Yuchen Mao
Bingrui Li
Junchi Yan
AAML
62
0
0
14 Oct 2024
Can Optimization Trajectories Explain Multi-Task Transfer?
David Mueller
Mark Dredze
Nicholas Andrews
58
1
0
26 Aug 2024
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
71
4
1
25 May 2024
SADDLe: Sharpness-Aware Decentralized Deep Learning with Heterogeneous Data
Sakshi Choudhary
Sai Aparna Aketi
Kaushik Roy
FedML
45
0
0
22 May 2024
High dimensional analysis reveals conservative sharpening and a stochastic edge of stability
Atish Agarwala
Jeffrey Pennington
41
3
0
30 Apr 2024
Investigation into the Training Dynamics of Learned Optimizers
Jan Sobotka
Petr Simánek
Daniel Vasata
28
0
0
12 Dec 2023
A Coefficient Makes SVRG Effective
Yida Yin
Zhiqiu Xu
Zhiyuan Li
Trevor Darrell
Zhuang Liu
33
1
0
09 Nov 2023
From Stability to Chaos: Analyzing Gradient Descent Dynamics in Quadratic Regression
Xuxing Chen
Krishnakumar Balasubramanian
Promit Ghosal
Bhavya Agrawalla
30
7
0
02 Oct 2023
Sharpness-Aware Minimization and the Edge of Stability
Philip M. Long
Peter L. Bartlett
AAML
27
9
0
21 Sep 2023
Dropout Reduces Underfitting
Zhuang Liu
Zhi-Qin John Xu
Joseph Jin
Zhiqiang Shen
Trevor Darrell
37
36
0
02 Mar 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
Kayhan Behdin
Qingquan Song
Aman Gupta
S. Keerthi
Ayan Acharya
Borja Ocejo
Gregory Dexter
Rajiv Khanna
D. Durfee
Rahul Mazumder
AAML
18
7
0
19 Feb 2023
SAM operates far from home: eigenvalue regularization as a dynamical phenomenon
Atish Agarwala
Yann N. Dauphin
21
20
0
17 Feb 2023
On a continuous time model of gradient descent dynamics and instability in deep learning
Mihaela Rosca
Yan Wu
Chongli Qin
Benoit Dherin
16
6
0
03 Feb 2023
Learning threshold neurons via the "edge of stability"
Kwangjun Ahn
Sébastien Bubeck
Sinho Chewi
Y. Lee
Felipe Suarez
Yi Zhang
MLT
36
36
0
14 Dec 2022
Leveraging Unlabeled Data to Track Memorization
Mahsa Forouzesh
Hanie Sedghi
Patrick Thiran
NoLa
TDI
34
3
0
08 Dec 2022
A survey of deep learning optimizers -- first and second order methods
Rohan Kashyap
ODL
34
6
0
28 Nov 2022
Easy Begun is Half Done: Spatial-Temporal Graph Modeling with ST-Curriculum Dropout
Hongjun Wang
Jiyuan Chen
Tongbo Pan
Z. Fan
Boyuan Zhang
Renhe Jiang
Lingyu Zhang
Yi Xie
Zhongyin Wang
Xuan Song
GNN
21
8
0
28 Nov 2022
Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach
Peng Mi
Li Shen
Tianhe Ren
Yiyi Zhou
Xiaoshuai Sun
Rongrong Ji
Dacheng Tao
AAML
27
69
0
11 Oct 2022
Understanding Edge-of-Stability Training Dynamics with a Minimalist Example
Xingyu Zhu
Zixuan Wang
Xiang Wang
Mo Zhou
Rong Ge
64
35
0
07 Oct 2022
Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging
Jean Kaddour
MoMe
3DH
24
39
0
29 Sep 2022
Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability
Z. Li
Zixuan Wang
Jian Li
19
42
0
26 Jul 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
37
69
0
14 Jun 2022
Linear Connectivity Reveals Generalization Strategies
Jeevesh Juneja
Rachit Bansal
Kyunghyun Cho
João Sedoc
Naomi Saphra
239
45
0
24 May 2022
High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation
Jimmy Ba
Murat A. Erdogdu
Taiji Suzuki
Zhichao Wang
Denny Wu
Greg Yang
MLT
34
121
0
03 May 2022
How Do Vision Transformers Work?
Namuk Park
Songkuk Kim
ViT
35
465
0
14 Feb 2022
Exponential escape efficiency of SGD from sharp minima in non-stationary regime
Hikaru Ibayashi
Masaaki Imaizumi
26
4
0
07 Nov 2021
A Loss Curvature Perspective on Training Instability in Deep Learning
Justin Gilmer
Behrooz Ghorbani
Ankush Garg
Sneha Kudugunta
Behnam Neyshabur
David E. Cardoze
George E. Dahl
Zachary Nado
Orhan Firat
ODL
36
35
0
08 Oct 2021
Logit Attenuating Weight Normalization
Aman Gupta
R. Ramanath
Jun Shi
Anika Ramachandran
Sirou Zhou
Mingzhou Zhou
S. Keerthi
37
1
0
12 Aug 2021
What can linear interpolation of neural network loss landscapes tell us?
Tiffany J. Vlaar
Jonathan Frankle
MoMe
22
27
0
30 Jun 2021
Intraclass clustering: an implicit learning ability that regularizes DNNs
Simon Carbonnelle
Christophe De Vleeschouwer
43
8
0
11 Mar 2021
Consensus Control for Decentralized Deep Learning
Lingjing Kong
Tao R. Lin
Anastasia Koloskova
Martin Jaggi
Sebastian U. Stich
19
75
0
09 Feb 2021
A Random Matrix Theory Approach to Damping in Deep Learning
Diego Granziol
Nicholas P. Baskerville
AI4CE
ODL
26
2
0
15 Nov 2020
Sharpness-Aware Minimization for Efficiently Improving Generalization
Pierre Foret
Ariel Kleiner
H. Mobahi
Behnam Neyshabur
AAML
62
1,276
0
03 Oct 2020
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
159
234
0
04 Mar 2020
Iterative Averaging in the Quest for Best Test Error
Diego Granziol
Xingchen Wan
Samuel Albanie
Stephen J. Roberts
10
3
0
02 Mar 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
281
2,889
0
15 Sep 2016
1