Understanding Edge-of-Stability Training Dynamics with a Minimalist
Example

Understanding Edge-of-Stability Training Dynamics with a Minimalist Example

7 October 2022

Papers citing "Understanding Edge-of-Stability Training Dynamics with a Minimalist Example"

16 / 16 papers shown

Title
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes Ruiqi Zhang Jingfeng Wu Licong Lin Peter L. Bartlett 56 2 0 05 Apr 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos Dayal Singh Kalra Tianyu He M. Barkeshli 105 6 0 17 Feb 2025
Does SGD really happen in tiny subspaces? Minhak Song Kwangjun Ahn Chulhee Yun 95 6 1 25 May 2024
Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability Z. Li Zixuan Wang Jian Li 41 46 0 26 Jul 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction Kaifeng Lyu Zhiyuan Li Sanjeev Arora FAtt 67 73 0 14 Jun 2022
Understanding Gradient Descent on Edge of Stability in Deep Learning Sanjeev Arora Zhiyuan Li A. Panigrahi MLT 88 97 0 19 May 2022
Understanding the unstable convergence of gradient descent Kwangjun Ahn J.N. Zhang S. Sra 67 60 0 03 Apr 2022
Robust Training of Neural Networks Using Scale Invariant Architectures Zhiyuan Li Srinadh Bhojanapalli Manzil Zaheer Sashank J. Reddi Surinder Kumar 53 28 0 02 Feb 2022
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect Yuqing Wang Minshuo Chen T. Zhao Molei Tao AI4CE 80 40 0 07 Oct 2021
Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability Jeremy M. Cohen Simran Kaur Yuanzhi Li J. Zico Kolter Ameet Talwalkar ODL 73 267 0 26 Feb 2021
Tilting the playing field: Dynamical loss functions for machine learning M. Ruíz-García Ge Zhang S. Schoenholz Andrea J. Liu 92 11 0 07 Feb 2021
The large learning rate phase of deep learning: the catapult mechanism Aitor Lewkowycz Yasaman Bahri Ethan Dyer Jascha Narain Sohl-Dickstein Guy Gur-Ari ODL 182 241 0 04 Mar 2020
The Break-Even Point on Optimization Trajectories of Deep Neural Networks Stanislaw Jastrzebski Maciej Szymczak Stanislav Fort Devansh Arpit Jacek Tabor Kyunghyun Cho Krzysztof J. Geras 76 161 0 21 Feb 2020
PyHessian: Neural Networks Through the Lens of the Hessian Z. Yao A. Gholami Kurt Keutzer Michael W. Mahoney ODL 51 302 0 16 Dec 2019
Theoretical Analysis of Auto Rate-Tuning by Batch Normalization Sanjeev Arora Zhiyuan Li Kaifeng Lyu 72 131 0 10 Dec 2018
A Walk with SGD Chen Xing Devansh Arpit Christos Tsirigotis Yoshua Bengio 85 119 0 24 Feb 2018