Understanding Gradient Descent on Edge of Stability in Deep Learning

19 May 2022

Papers citing "Understanding Gradient Descent on Edge of Stability in Deep Learning"

28 / 28 papers shown

Title
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos Dayal Singh Kalra Tianyu He M. Barkeshli 49 4 0 17 Feb 2025
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training Zhanpeng Zhou Mingze Wang Yuchen Mao Bingrui Li Junchi Yan AAML 62 0 0 14 Oct 2024
Dynamic Decoupling of Placid Terminal Attractor-based Gradient Descent Algorithm Jinwei Zhao Marco Gori Alessandro Betti S. Melacci Hongtao Zhang Jiedong Liu Xinhong Hei 30 0 0 10 Sep 2024
Does SGD really happen in tiny subspaces? Minhak Song Kwangjun Ahn Chulhee Yun 71 4 1 25 May 2024
$Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization$ Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization Shuo Xie Zhiyuan Li OffRL 47 13 0 05 Apr 2024
From Stability to Chaos: Analyzing Gradient Descent Dynamics in Quadratic Regression Xuxing Chen Krishnakumar Balasubramanian Promit Ghosal Bhavya Agrawalla 33 7 0 02 Oct 2023
Sharpness-Aware Minimization and the Edge of Stability Philip M. Long Peter L. Bartlett AAML 27 9 0 21 Sep 2023
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization Kaiyue Wen Zhiyuan Li Tengyu Ma FAtt 38 26 0 20 Jul 2023
How to escape sharp minima with random perturbations Kwangjun Ahn Ali Jadbabaie S. Sra ODL 32 6 0 25 May 2023
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond Itai Kreisler Mor Shpigel Nacson Daniel Soudry Y. Carmon 30 13 0 22 May 2023
Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability Jingfeng Wu Vladimir Braverman Jason D. Lee 32 17 0 19 May 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization Kayhan Behdin Qingquan Song Aman Gupta S. Keerthi Ayan Acharya Borja Ocejo Gregory Dexter Rajiv Khanna D. Durfee Rahul Mazumder AAML 18 7 0 19 Feb 2023
The Geometry of Neural Nets' Parameter Spaces Under Reparametrization Agustinus Kristiadi Felix Dangel Philipp Hennig 32 11 0 14 Feb 2023
On a continuous time model of gradient descent dynamics and instability in deep learning Mihaela Rosca Yan Wu Chongli Qin Benoit Dherin 16 6 0 03 Feb 2023
Stretched and measured neural predictions of complex network dynamics V. Vasiliauskaite Nino Antulov-Fantulin 30 1 0 12 Jan 2023
Learning threshold neurons via the "edge of stability" Kwangjun Ahn Sébastien Bubeck Sinho Chewi Y. Lee Felipe Suarez Yi Zhang MLT 38 36 0 14 Dec 2022
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States Ziqiao Wang Yongyi Mao 30 10 0 19 Nov 2022
How Does Sharpness-Aware Minimization Minimize Sharpness? Kaiyue Wen Tengyu Ma Zhiyuan Li AAML 23 47 0 10 Nov 2022
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models Hong Liu Sang Michael Xie Zhiyuan Li Tengyu Ma AI4CE 40 49 0 25 Oct 2022
On skip connections and normalisation layers in deep optimisation L. MacDonald Jack Valmadre Hemanth Saratchandran Simon Lucey ODL 19 1 0 10 Oct 2022
Understanding Edge-of-Stability Training Dynamics with a Minimalist Example Xingyu Zhu Zixuan Wang Xiang Wang Mo Zhou Rong Ge 66 35 0 07 Oct 2022
The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima Peter L. Bartlett Philip M. Long Olivier Bousquet 76 34 0 04 Oct 2022
Deep Double Descent via Smooth Interpolation Matteo Gamba Erik Englesson Marten Bjorkman Hossein Azizpour 63 10 0 21 Sep 2022
On the Implicit Bias in Deep-Learning Algorithms Gal Vardi FedML AI4CE 34 72 0 26 Aug 2022
Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability Z. Li Zixuan Wang Jian Li 19 42 0 26 Jul 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction Kaifeng Lyu Zhiyuan Li Sanjeev Arora FAtt 40 69 0 14 Jun 2022
High-dimensional limit theorems for SGD: Effective dynamics and critical scaling Gerard Ben Arous Reza Gheissari Aukosh Jagannath 62 58 0 08 Jun 2022
Understanding the unstable convergence of gradient descent Kwangjun Ahn Junzhe Zhang S. Sra 28 57 0 03 Apr 2022