ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.11326
  4. Cited By
Beyond the Quadratic Approximation: the Multiscale Structure of Neural
  Network Loss Landscapes
v1v2v3 (latest)

Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes

24 April 2022
Chao Ma
D. Kunin
Lei Wu
Lexing Ying
ArXiv (abs)PDFHTML

Papers citing "Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes"

23 / 23 papers shown
Title
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
Ruiqi Zhang
Jingfeng Wu
Licong Lin
Peter L. Bartlett
73
2
0
05 Apr 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Dayal Singh Kalra
Tianyu He
M. Barkeshli
121
7
0
17 Feb 2025
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Zhanpeng Zhou
Mingze Wang
Yuchen Mao
Bingrui Li
Junchi Yan
AAML
95
1
0
14 Oct 2024
Understanding the unstable convergence of gradient descent
Understanding the unstable convergence of gradient descent
Kwangjun Ahn
J.N. Zhang
S. Sra
85
63
0
03 Apr 2022
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
Zhiyuan Li
Tianhao Wang
Sanjeev Arora
MLT
115
105
0
13 Oct 2021
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect
Yuqing Wang
Minshuo Chen
T. Zhao
Molei Tao
AI4CE
104
42
0
07 Oct 2021
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations,
  and Anomalous Diffusion
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion
D. Kunin
Javier Sagastuy-Breña
Lauren Gillespie
Eshed Margalit
Hidenori Tanaka
Surya Ganguli
Daniel L. K. Yamins
70
20
0
19 Jul 2021
Label Noise SGD Provably Prefers Flat Global Minimizers
Label Noise SGD Provably Prefers Flat Global Minimizers
Alexandru Damian
Tengyu Ma
Jason D. Lee
NoLa
120
120
0
11 Jun 2021
On Linear Stability of SGD and Input-Smoothness of Neural Networks
On Linear Stability of SGD and Input-Smoothness of Neural Networks
Chao Ma
Lexing Ying
MLT
54
44
0
27 May 2021
Gradient Descent on Neural Networks Typically Occurs at the Edge of
  Stability
Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability
Jeremy M. Cohen
Simran Kaur
Yuanzhi Li
J. Zico Kolter
Ameet Talwalkar
ODL
104
279
0
26 Feb 2021
Stochasticity of Deterministic Gradient Descent: Large Learning Rate for
  Multiscale Objective Function
Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function
Lingkai Kong
Molei Tao
35
23
0
14 Feb 2020
Loss Landscape Sightseeing with Multi-Point Optimization
Loss Landscape Sightseeing with Multi-Point Optimization
Ivan Skorokhodov
Andrey Kravchenko
3DPC
44
18
0
09 Oct 2019
At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima
  Selection in Asynchronous Training of Neural Networks?
At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks?
Niv Giladi
Mor Shpigel Nacson
Elad Hoffer
Daniel Soudry
80
22
0
26 Sep 2019
Implicit regularization for deep neural networks driven by an
  Ornstein-Uhlenbeck like process
Implicit regularization for deep neural networks driven by an Ornstein-Uhlenbeck like process
Guy Blanc
Neha Gupta
Gregory Valiant
Paul Valiant
147
147
0
19 Apr 2019
A Comparative Analysis of the Optimization and Generalization Property
  of Two-layer Neural Network and Random Feature Models Under Gradient Descent
  Dynamics
A Comparative Analysis of the Optimization and Generalization Property of Two-layer Neural Network and Random Feature Models Under Gradient Descent Dynamics
E. Weinan
Chao Ma
Lei Wu
MLT
64
123
0
08 Apr 2019
On Lazy Training in Differentiable Programming
On Lazy Training in Differentiable Programming
Lénaïc Chizat
Edouard Oyallon
Francis R. Bach
111
840
0
19 Dec 2018
Towards Theoretical Understanding of Large Batch Training in Stochastic
  Gradient Descent
Towards Theoretical Understanding of Large Batch Training in Stochastic Gradient Descent
Xiaowu Dai
Yuhua Zhu
53
11
0
03 Dec 2018
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU
  Networks
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks
Difan Zou
Yuan Cao
Dongruo Zhou
Quanquan Gu
ODL
198
448
0
21 Nov 2018
A Convergence Theory for Deep Learning via Over-Parameterization
A Convergence Theory for Deep Learning via Over-Parameterization
Zeyuan Allen-Zhu
Yuanzhi Li
Zhao Song
AI4CEODL
266
1,469
0
09 Nov 2018
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
S. Du
Xiyu Zhai
Barnabás Póczós
Aarti Singh
MLTODL
233
1,276
0
04 Oct 2018
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Arthur Jacot
Franck Gabriel
Clément Hongler
273
3,223
0
20 Jun 2018
An Alternative View: When Does SGD Escape Local Minima?
An Alternative View: When Does SGD Escape Local Minima?
Robert D. Kleinberg
Yuanzhi Li
Yang Yuan
MLT
80
317
0
17 Feb 2018
Sharp Minima Can Generalize For Deep Nets
Sharp Minima Can Generalize For Deep Nets
Laurent Dinh
Razvan Pascanu
Samy Bengio
Yoshua Bengio
ODL
138
774
0
15 Mar 2017
1