v1v2 (latest)

Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability

26 July 2022

Papers citing "Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability"

29 / 29 papers shown

Title
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes Ruiqi Zhang Jingfeng Wu Licong Lin Peter L. Bartlett 73 2 0 05 Apr 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos Dayal Singh Kalra Tianyu He M. Barkeshli 121 7 0 17 Feb 2025
Improving Multi-task Learning via Seeking Task-based Flat Regions Hoang Phan Lam C. Tran Ngoc N. Tran Nhat Ho Tuan Truong Qi Lei Nhat Ho Dinh Q. Phung Trung Le 205 11 0 24 Nov 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction Kaifeng Lyu Zhiyuan Li Sanjeev Arora FAtt 83 75 0 14 Jun 2022
Understanding Gradient Descent on Edge of Stability in Deep Learning Sanjeev Arora Zhiyuan Li A. Panigrahi MLT 110 99 0 19 May 2022
Understanding the unstable convergence of gradient descent Kwangjun Ahn J.N. Zhang S. Sra 82 63 0 03 Apr 2022
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework Zhiyuan Li Tianhao Wang Sanjeev Arora MLT 113 105 0 13 Oct 2021
Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability Jeremy M. Cohen Simran Kaur Yuanzhi Li J. Zico Kolter Ameet Talwalkar ODL 104 277 0 26 Feb 2021
Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks Yikai Wu Xingyu Zhu Chenwei Wu Annie Wang Rong Ge 105 45 0 08 Oct 2020
Sharpness-Aware Minimization for Efficiently Improving Generalization Pierre Foret Ariel Kleiner H. Mobahi Behnam Neyshabur AAML 199 1,358 0 03 Oct 2020
The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks Wei Hu Lechao Xiao Ben Adlam Jeffrey Pennington 64 63 0 25 Jun 2020
The large learning rate phase of deep learning: the catapult mechanism Aitor Lewkowycz Yasaman Bahri Ethan Dyer Jascha Narain Sohl-Dickstein Guy Gur-Ari ODL 202 241 0 04 Mar 2020
The Break-Even Point on Optimization Trajectories of Deep Neural Networks Stanislaw Jastrzebski Maciej Szymczak Stanislav Fort Devansh Arpit Jacek Tabor Kyunghyun Cho Krzysztof J. Geras 85 163 0 21 Feb 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library Adam Paszke Sam Gross Francisco Massa Adam Lerer James Bradbury ... Sasank Chilamkurthy Benoit Steiner Lu Fang Junjie Bai Soumith Chintala ODL 547 42,639 0 03 Dec 2019
Pathological spectra of the Fisher information metric and its variants in deep neural networks Ryo Karakida S. Akaho S. Amari 63 28 0 14 Oct 2019
Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks Ziwei Ji Matus Telgarsky 72 178 0 26 Sep 2019
The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks Ryo Karakida S. Akaho S. Amari 63 41 0 07 Jun 2019
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent Jaehoon Lee Lechao Xiao S. Schoenholz Yasaman Bahri Roman Novak Jascha Narain Sohl-Dickstein Jeffrey Pennington 213 1,108 0 18 Feb 2019
Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks Sanjeev Arora S. Du Wei Hu Zhiyuan Li Ruosong Wang MLT 208 974 0 24 Jan 2019
Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians Vardan Papyan 69 88 0 24 Jan 2019
On Lazy Training in Differentiable Programming Lénaïc Chizat Edouard Oyallon Francis R. Bach 111 840 0 19 Dec 2018
Gradient Descent Finds Global Minima of Deep Neural Networks S. Du Jason D. Lee Haochuan Li Liwei Wang Masayoshi Tomizuka ODL 229 1,136 0 09 Nov 2018
Gradient Descent Provably Optimizes Over-parameterized Neural Networks S. Du Xiyu Zhai Barnabás Póczós Aarti Singh MLT ODL 233 1,276 0 04 Oct 2018
Neural Tangent Kernel: Convergence and Generalization in Neural Networks Arthur Jacot Franck Gabriel Clément Hongler 273 3,223 0 20 Jun 2018
A Walk with SGD Chen Xing Devansh Arpit Christos Tsirigotis Yoshua Bengio 94 119 0 24 Feb 2018
Empirical Analysis of the Hessian of Over-Parametrized Neural Networks Levent Sagun Utku Evci V. U. Güney Yann N. Dauphin Léon Bottou 56 418 0 14 Jun 2017
Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond Levent Sagun Léon Bottou Yann LeCun UQCV 95 236 0 22 Nov 2016
Optimization Methods for Large-Scale Machine Learning Léon Bottou Frank E. Curtis J. Nocedal 252 3,225 0 15 Jun 2016
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 2.2K 194,510 0 10 Dec 2015