High dimensional analysis reveals conservative sharpening and a stochastic edge of stability

30 April 2024

Papers citing "High dimensional analysis reveals conservative sharpening and a stochastic edge of stability"

32 / 32 papers shown

Title
Stepping on the Edge: Curvature Aware Learning Rate Tuners Vincent Roulet Atish Agarwala Jean-Bastien Grill Grzegorz Swirszcz Mathieu Blondel Fabian Pedregosa 80 3 0 08 Jul 2024
Does SGD really happen in tiny subspaces? Minhak Song Kwangjun Ahn Chulhee Yun 97 6 1 25 May 2024
Neglected Hessian component explains mysteries in Sharpness regularization Yann N. Dauphin Atish Agarwala Hossein Mobahi FAtt 77 7 0 19 Jan 2024
On the Interplay Between Stepsize Tuning and Progressive Sharpening Vincent Roulet Atish Agarwala Fabian Pedregosa 61 4 0 30 Nov 2023
Hitting the High-Dimensional Notes: An ODE for SGD learning dynamics on GLMs and multi-index models Elizabeth Collins-Woodfin Courtney Paquette Elliot Paquette Inbar Seroussi 16 14 0 17 Aug 2023
Exact Mean Square Linear Stability Analysis for SGD Rotem Mulayoff T. Michaeli MLT 46 1 0 13 Jun 2023
The Implicit Regularization of Dynamical Stability in Stochastic Gradient Descent Lei Wu Weijie J. Su MLT 51 23 0 27 May 2023
SAM operates far from home: eigenvalue regularization as a dynamical phenomenon Atish Agarwala Yann N. Dauphin 43 20 0 17 Feb 2023
From high-dimensional & mean-field dynamics to dimensionless ODEs: A unifying approach to SGD in two-layers networks Luca Arnaboldi Ludovic Stephan Florent Krzakala Bruno Loureiro MLT 73 33 0 12 Feb 2023
Second-order regression models exhibit progressive sharpening to the edge of stability Atish Agarwala Fabian Pedregosa Jeffrey Pennington 84 27 0 10 Oct 2022
Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability Alexandru Damian Eshaan Nichani Jason D. Lee 72 83 0 30 Sep 2022
Adaptive Gradient Methods at the Edge of Stability Jeremy M. Cohen Behrooz Ghorbani Shankar Krishnan Naman Agarwal Sourabh Medapati ... Daniel Suo David E. Cardoze Zachary Nado George E. Dahl Justin Gilmer ODL 87 53 0 29 Jul 2022
The alignment property of SGD noise and how it helps select flat minima: A stability analysis Lei Wu Mingze Wang Weijie Su MLT 62 32 0 06 Jul 2022
Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions Courtney Paquette Elliot Paquette Ben Adlam Jeffrey Pennington 50 13 0 15 Jun 2022
High-dimensional limit theorems for SGD: Effective dynamics and critical scaling Gerard Ben Arous Reza Gheissari Aukosh Jagannath 99 58 0 08 Jun 2022
Quadratic models for understanding catapult dynamics of neural networks Libin Zhu Chaoyue Liu Adityanarayanan Radhakrishnan M. Belkin 58 13 0 24 May 2022
Homogenization of SGD in high-dimensions: Exact dynamics and generalization properties Courtney Paquette Elliot Paquette Ben Adlam Jeffrey Pennington 31 21 0 14 May 2022
On Linear Stability of SGD and Input-Smoothness of Neural Networks Chao Ma Lexing Ying MLT 37 44 0 27 May 2021
Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability Jeremy M. Cohen Simran Kaur Yuanzhi Li J. Zico Kolter Ameet Talwalkar ODL 80 267 0 26 Feb 2021
SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize Criticality Courtney Paquette Kiwon Lee Fabian Pedregosa Elliot Paquette 42 35 0 08 Feb 2021
Sharpness-Aware Minimization for Efficiently Improving Generalization Pierre Foret Ariel Kleiner H. Mobahi Behnam Neyshabur AAML 192 1,349 0 03 Oct 2020
The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization Ben Adlam Jeffrey Pennington 49 125 0 15 Aug 2020
The Break-Even Point on Optimization Trajectories of Deep Neural Networks Stanislaw Jastrzebski Maciej Szymczak Stanislav Fort Devansh Arpit Jacek Tabor Kyunghyun Cho Krzysztof J. Geras 78 161 0 21 Feb 2020
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent Jaehoon Lee Lechao Xiao S. Schoenholz Yasaman Bahri Roman Novak Jascha Narain Sohl-Dickstein Jeffrey Pennington 211 1,101 0 18 Feb 2019
Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit Song Mei Theodor Misiakiewicz Andrea Montanari MLT 75 278 0 16 Feb 2019
An Investigation into Neural Net Optimization via Hessian Eigenvalue Density Behrooz Ghorbani Shankar Krishnan Ying Xiao ODL 62 323 0 29 Jan 2019
Measuring the Effects of Data Parallelism on Neural Network Training Christopher J. Shallue Jaehoon Lee J. Antognini J. Mamou J. Ketterling Yao Wang 82 410 0 08 Nov 2018
Neural Tangent Kernel: Convergence and Generalization in Neural Networks Arthur Jacot Franck Gabriel Clément Hongler 267 3,195 0 20 Jun 2018
Three Factors Influencing Minima in SGD Stanislaw Jastrzebski Zachary Kenton Devansh Arpit Nicolas Ballas Asja Fischer Yoshua Bengio Amos Storkey 76 463 0 13 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size Samuel L. Smith Pieter-Jan Kindermans Chris Ying Quoc V. Le ODL 99 995 0 01 Nov 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour Priya Goyal Piotr Dollár Ross B. Girshick P. Noordhuis Lukasz Wesolowski Aapo Kyrola Andrew Tulloch Yangqing Jia Kaiming He 3DH 126 3,678 0 08 Jun 2017
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 2.2K 193,878 0 10 Dec 2015