ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.19261
  4. Cited By
High dimensional analysis reveals conservative sharpening and a stochastic edge of stability

High dimensional analysis reveals conservative sharpening and a stochastic edge of stability

30 April 2024
Atish Agarwala
Jeffrey Pennington
ArXivPDFHTML

Papers citing "High dimensional analysis reveals conservative sharpening and a stochastic edge of stability"

32 / 32 papers shown
Title
Stepping on the Edge: Curvature Aware Learning Rate Tuners
Stepping on the Edge: Curvature Aware Learning Rate Tuners
Vincent Roulet
Atish Agarwala
Jean-Bastien Grill
Grzegorz Swirszcz
Mathieu Blondel
Fabian Pedregosa
80
3
0
08 Jul 2024
Does SGD really happen in tiny subspaces?
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
97
6
1
25 May 2024
Neglected Hessian component explains mysteries in Sharpness
  regularization
Neglected Hessian component explains mysteries in Sharpness regularization
Yann N. Dauphin
Atish Agarwala
Hossein Mobahi
FAtt
77
7
0
19 Jan 2024
On the Interplay Between Stepsize Tuning and Progressive Sharpening
On the Interplay Between Stepsize Tuning and Progressive Sharpening
Vincent Roulet
Atish Agarwala
Fabian Pedregosa
61
4
0
30 Nov 2023
Hitting the High-Dimensional Notes: An ODE for SGD learning dynamics on
  GLMs and multi-index models
Hitting the High-Dimensional Notes: An ODE for SGD learning dynamics on GLMs and multi-index models
Elizabeth Collins-Woodfin
Courtney Paquette
Elliot Paquette
Inbar Seroussi
16
14
0
17 Aug 2023
Exact Mean Square Linear Stability Analysis for SGD
Exact Mean Square Linear Stability Analysis for SGD
Rotem Mulayoff
T. Michaeli
MLT
46
1
0
13 Jun 2023
The Implicit Regularization of Dynamical Stability in Stochastic
  Gradient Descent
The Implicit Regularization of Dynamical Stability in Stochastic Gradient Descent
Lei Wu
Weijie J. Su
MLT
51
23
0
27 May 2023
SAM operates far from home: eigenvalue regularization as a dynamical
  phenomenon
SAM operates far from home: eigenvalue regularization as a dynamical phenomenon
Atish Agarwala
Yann N. Dauphin
43
20
0
17 Feb 2023
From high-dimensional & mean-field dynamics to dimensionless ODEs: A
  unifying approach to SGD in two-layers networks
From high-dimensional & mean-field dynamics to dimensionless ODEs: A unifying approach to SGD in two-layers networks
Luca Arnaboldi
Ludovic Stephan
Florent Krzakala
Bruno Loureiro
MLT
73
33
0
12 Feb 2023
Second-order regression models exhibit progressive sharpening to the
  edge of stability
Second-order regression models exhibit progressive sharpening to the edge of stability
Atish Agarwala
Fabian Pedregosa
Jeffrey Pennington
84
27
0
10 Oct 2022
Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of
  Stability
Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability
Alexandru Damian
Eshaan Nichani
Jason D. Lee
72
83
0
30 Sep 2022
Adaptive Gradient Methods at the Edge of Stability
Adaptive Gradient Methods at the Edge of Stability
Jeremy M. Cohen
Behrooz Ghorbani
Shankar Krishnan
Naman Agarwal
Sourabh Medapati
...
Daniel Suo
David E. Cardoze
Zachary Nado
George E. Dahl
Justin Gilmer
ODL
87
53
0
29 Jul 2022
The alignment property of SGD noise and how it helps select flat minima:
  A stability analysis
The alignment property of SGD noise and how it helps select flat minima: A stability analysis
Lei Wu
Mingze Wang
Weijie Su
MLT
62
32
0
06 Jul 2022
Implicit Regularization or Implicit Conditioning? Exact Risk
  Trajectories of SGD in High Dimensions
Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions
Courtney Paquette
Elliot Paquette
Ben Adlam
Jeffrey Pennington
50
13
0
15 Jun 2022
High-dimensional limit theorems for SGD: Effective dynamics and critical
  scaling
High-dimensional limit theorems for SGD: Effective dynamics and critical scaling
Gerard Ben Arous
Reza Gheissari
Aukosh Jagannath
99
58
0
08 Jun 2022
Quadratic models for understanding catapult dynamics of neural networks
Quadratic models for understanding catapult dynamics of neural networks
Libin Zhu
Chaoyue Liu
Adityanarayanan Radhakrishnan
M. Belkin
58
13
0
24 May 2022
Homogenization of SGD in high-dimensions: Exact dynamics and
  generalization properties
Homogenization of SGD in high-dimensions: Exact dynamics and generalization properties
Courtney Paquette
Elliot Paquette
Ben Adlam
Jeffrey Pennington
31
21
0
14 May 2022
On Linear Stability of SGD and Input-Smoothness of Neural Networks
On Linear Stability of SGD and Input-Smoothness of Neural Networks
Chao Ma
Lexing Ying
MLT
37
44
0
27 May 2021
Gradient Descent on Neural Networks Typically Occurs at the Edge of
  Stability
Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability
Jeremy M. Cohen
Simran Kaur
Yuanzhi Li
J. Zico Kolter
Ameet Talwalkar
ODL
80
267
0
26 Feb 2021
SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize
  Criticality
SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize Criticality
Courtney Paquette
Kiwon Lee
Fabian Pedregosa
Elliot Paquette
42
35
0
08 Feb 2021
Sharpness-Aware Minimization for Efficiently Improving Generalization
Sharpness-Aware Minimization for Efficiently Improving Generalization
Pierre Foret
Ariel Kleiner
H. Mobahi
Behnam Neyshabur
AAML
192
1,349
0
03 Oct 2020
The Neural Tangent Kernel in High Dimensions: Triple Descent and a
  Multi-Scale Theory of Generalization
The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization
Ben Adlam
Jeffrey Pennington
49
125
0
15 Aug 2020
The Break-Even Point on Optimization Trajectories of Deep Neural
  Networks
The Break-Even Point on Optimization Trajectories of Deep Neural Networks
Stanislaw Jastrzebski
Maciej Szymczak
Stanislav Fort
Devansh Arpit
Jacek Tabor
Kyunghyun Cho
Krzysztof J. Geras
78
161
0
21 Feb 2020
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient
  Descent
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
Jaehoon Lee
Lechao Xiao
S. Schoenholz
Yasaman Bahri
Roman Novak
Jascha Narain Sohl-Dickstein
Jeffrey Pennington
211
1,101
0
18 Feb 2019
Mean-field theory of two-layers neural networks: dimension-free bounds
  and kernel limit
Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit
Song Mei
Theodor Misiakiewicz
Andrea Montanari
MLT
75
278
0
16 Feb 2019
An Investigation into Neural Net Optimization via Hessian Eigenvalue
  Density
An Investigation into Neural Net Optimization via Hessian Eigenvalue Density
Behrooz Ghorbani
Shankar Krishnan
Ying Xiao
ODL
62
323
0
29 Jan 2019
Measuring the Effects of Data Parallelism on Neural Network Training
Measuring the Effects of Data Parallelism on Neural Network Training
Christopher J. Shallue
Jaehoon Lee
J. Antognini
J. Mamou
J. Ketterling
Yao Wang
82
410
0
08 Nov 2018
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Arthur Jacot
Franck Gabriel
Clément Hongler
267
3,195
0
20 Jun 2018
Three Factors Influencing Minima in SGD
Three Factors Influencing Minima in SGD
Stanislaw Jastrzebski
Zachary Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
76
463
0
13 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
99
995
0
01 Nov 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
126
3,678
0
08 Jun 2017
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
193,878
0
10 Dec 2015
1