Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.11326
Cited By
v1
v2
v3 (latest)
Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes
24 April 2022
Chao Ma
D. Kunin
Lei Wu
Lexing Ying
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes"
23 / 23 papers shown
Title
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
Ruiqi Zhang
Jingfeng Wu
Licong Lin
Peter L. Bartlett
73
2
0
05 Apr 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Dayal Singh Kalra
Tianyu He
M. Barkeshli
121
7
0
17 Feb 2025
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Zhanpeng Zhou
Mingze Wang
Yuchen Mao
Bingrui Li
Junchi Yan
AAML
95
1
0
14 Oct 2024
Understanding the unstable convergence of gradient descent
Kwangjun Ahn
J.N. Zhang
S. Sra
85
63
0
03 Apr 2022
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
Zhiyuan Li
Tianhao Wang
Sanjeev Arora
MLT
115
105
0
13 Oct 2021
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect
Yuqing Wang
Minshuo Chen
T. Zhao
Molei Tao
AI4CE
104
42
0
07 Oct 2021
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion
D. Kunin
Javier Sagastuy-Breña
Lauren Gillespie
Eshed Margalit
Hidenori Tanaka
Surya Ganguli
Daniel L. K. Yamins
70
20
0
19 Jul 2021
Label Noise SGD Provably Prefers Flat Global Minimizers
Alexandru Damian
Tengyu Ma
Jason D. Lee
NoLa
120
120
0
11 Jun 2021
On Linear Stability of SGD and Input-Smoothness of Neural Networks
Chao Ma
Lexing Ying
MLT
54
44
0
27 May 2021
Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability
Jeremy M. Cohen
Simran Kaur
Yuanzhi Li
J. Zico Kolter
Ameet Talwalkar
ODL
104
279
0
26 Feb 2021
Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function
Lingkai Kong
Molei Tao
35
23
0
14 Feb 2020
Loss Landscape Sightseeing with Multi-Point Optimization
Ivan Skorokhodov
Andrey Kravchenko
3DPC
44
18
0
09 Oct 2019
At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks?
Niv Giladi
Mor Shpigel Nacson
Elad Hoffer
Daniel Soudry
80
22
0
26 Sep 2019
Implicit regularization for deep neural networks driven by an Ornstein-Uhlenbeck like process
Guy Blanc
Neha Gupta
Gregory Valiant
Paul Valiant
147
147
0
19 Apr 2019
A Comparative Analysis of the Optimization and Generalization Property of Two-layer Neural Network and Random Feature Models Under Gradient Descent Dynamics
E. Weinan
Chao Ma
Lei Wu
MLT
64
123
0
08 Apr 2019
On Lazy Training in Differentiable Programming
Lénaïc Chizat
Edouard Oyallon
Francis R. Bach
111
840
0
19 Dec 2018
Towards Theoretical Understanding of Large Batch Training in Stochastic Gradient Descent
Xiaowu Dai
Yuhua Zhu
53
11
0
03 Dec 2018
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks
Difan Zou
Yuan Cao
Dongruo Zhou
Quanquan Gu
ODL
198
448
0
21 Nov 2018
A Convergence Theory for Deep Learning via Over-Parameterization
Zeyuan Allen-Zhu
Yuanzhi Li
Zhao Song
AI4CE
ODL
266
1,469
0
09 Nov 2018
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
S. Du
Xiyu Zhai
Barnabás Póczós
Aarti Singh
MLT
ODL
233
1,276
0
04 Oct 2018
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Arthur Jacot
Franck Gabriel
Clément Hongler
273
3,223
0
20 Jun 2018
An Alternative View: When Does SGD Escape Local Minima?
Robert D. Kleinberg
Yuanzhi Li
Yang Yuan
MLT
80
317
0
17 Feb 2018
Sharp Minima Can Generalize For Deep Nets
Laurent Dinh
Razvan Pascanu
Samy Bengio
Yoshua Bengio
ODL
138
774
0
15 Mar 2017
1