Stochasticity of Deterministic Gradient Descent: Large Learning Rate for
Multiscale Objective Function

Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function

14 February 2020

Lingkai Kong

Papers citing "Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function"

13 / 13 papers shown

Title
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes Ruiqi Zhang Jingfeng Wu Licong Lin Peter L. Bartlett 33 0 0 05 Apr 2025
$The boundary of neural network trainability is fractal$ The boundary of neural network trainability is fractal Jascha Narain Sohl-Dickstein 36 8 0 09 Feb 2024
From Stability to Chaos: Analyzing Gradient Descent Dynamics in Quadratic Regression Xuxing Chen Krishnakumar Balasubramanian Promit Ghosal Bhavya Agrawalla 38 7 0 02 Oct 2023
Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability Jingfeng Wu Vladimir Braverman Jason D. Lee 32 17 0 19 May 2023
On the generalization of learning algorithms that do not converge N. Chandramoorthy Andreas Loukas Khashayar Gatmiry Stefanie Jegelka MLT 19 11 0 16 Aug 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction Kaifeng Lyu Zhiyuan Li Sanjeev Arora FAtt 45 71 0 14 Jun 2022
Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes Chao Ma D. Kunin Lei Wu Lexing Ying 25 27 0 24 Apr 2022
Gradients are Not All You Need Luke Metz C. Freeman S. Schoenholz Tal Kachman 30 93 0 10 Nov 2021
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect Yuqing Wang Minshuo Chen T. Zhao Molei Tao AI4CE 57 40 0 07 Oct 2021
Stochastic Training is Not Necessary for Generalization Jonas Geiping Micah Goldblum Phillip E. Pope Michael Moeller Tom Goldstein 89 72 0 29 Sep 2021
The large learning rate phase of deep learning: the catapult mechanism Aitor Lewkowycz Yasaman Bahri Ethan Dyer Jascha Narain Sohl-Dickstein Guy Gur-Ari ODL 159 235 0 04 Mar 2020
A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights Weijie Su Stephen P. Boyd Emmanuel J. Candes 108 1,157 0 04 Mar 2015
Norm-Based Capacity Control in Neural Networks Behnam Neyshabur Ryota Tomioka Nathan Srebro 127 577 0 27 Feb 2015