Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.03677
Cited By
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect
7 October 2021
Yuqing Wang
Minshuo Chen
T. Zhao
Molei Tao
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect"
35 / 35 papers shown
Title
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
Ruiqi Zhang
Jingfeng Wu
Licong Lin
Peter L. Bartlett
51
0
0
05 Apr 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Dayal Singh Kalra
Tianyu He
M. Barkeshli
95
4
0
17 Feb 2025
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
Pierfrancesco Beneventano
Blake Woodworth
MLT
77
1
0
15 Jan 2025
Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably
Tianyi Liu
Yan Li
Enlu Zhou
Tuo Zhao
53
1
0
07 Feb 2022
Implicit Regularization of Bregman Proximal Point Algorithm and Mirror Descent on Separable Data
Yan Li
Caleb Ju
Ethan X. Fang
T. Zhao
32
9
0
15 Aug 2021
Global Convergence of Gradient Descent for Asymmetric Low-Rank Matrix Factorization
Tian-Chun Ye
S. Du
41
47
0
27 Jun 2021
Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization
Tianyi Liu
Yan Li
S. Wei
Enlu Zhou
T. Zhao
38
13
0
24 Feb 2021
Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning
Pan Zhou
Jiashi Feng
Chao Ma
Caiming Xiong
Guosheng Lin
E. Weinan
71
231
0
12 Oct 2020
A Comparison of Optimization Algorithms for Deep Learning
Derya Soydaner
105
155
0
28 Jul 2020
Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems
Preetum Nakkiran
MLT
44
21
0
15 May 2020
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
182
238
0
04 Mar 2020
The Landscape of Matrix Factorization Revisited
Hossein Valavi
Sulin Liu
Peter J. Ramadge
52
5
0
27 Feb 2020
Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function
Lingkai Kong
Molei Tao
33
22
0
14 Feb 2020
An Exponential Learning Rate Schedule for Deep Learning
Zhiyuan Li
Sanjeev Arora
42
214
0
16 Oct 2019
On the Variance of the Adaptive Learning Rate and Beyond
Liyuan Liu
Haoming Jiang
Pengcheng He
Weizhu Chen
Xiaodong Liu
Jianfeng Gao
Jiawei Han
ODL
198
1,894
0
08 Aug 2019
Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks
Yuanzhi Li
Colin Wei
Tengyu Ma
42
295
0
10 Jul 2019
A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation
Akhilesh Deepak Gotmare
N. Keskar
Caiming Xiong
R. Socher
ODL
57
275
0
29 Oct 2018
Gradient descent aligns the layers of deep linear networks
Ziwei Ji
Matus Telgarsky
111
250
0
04 Oct 2018
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Arthur Jacot
Franck Gabriel
Clément Hongler
213
3,160
0
20 Jun 2018
On Landscape of Lagrangian Functions and Stochastic Search for Constrained Nonconvex Optimization
Zhehui Chen
Xingguo Li
Lin F. Yang
Jarvis Haupt
T. Zhao
21
4
0
13 Jun 2018
Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced
S. Du
Wei Hu
Jason D. Lee
MLT
121
239
0
04 Jun 2018
A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay
L. Smith
271
1,026
0
26 Mar 2018
Three Factors Influencing Minima in SGD
Stanislaw Jastrzebski
Zachary Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
76
459
0
13 Nov 2017
The Implicit Bias of Gradient Descent on Separable Data
Daniel Soudry
Elad Hoffer
Mor Shpigel Nacson
Suriya Gunasekar
Nathan Srebro
115
908
0
27 Oct 2017
Implicit Regularization in Matrix Factorization
Suriya Gunasekar
Blake E. Woodworth
Srinadh Bhojanapalli
Behnam Neyshabur
Nathan Srebro
75
490
0
25 May 2017
No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis
Rong Ge
Chi Jin
Yi Zheng
113
435
0
03 Apr 2017
Optimization Methods for Large-Scale Machine Learning
Léon Bottou
Frank E. Curtis
J. Nocedal
195
3,198
0
15 Jun 2016
A Geometric Analysis of Phase Retrieval
Ju Sun
Qing Qu
John N. Wright
86
524
0
22 Feb 2016
Dropping Convexity for Faster Semi-definite Optimization
Srinadh Bhojanapalli
Anastasios Kyrillidis
Sujay Sanghavi
63
173
0
14 Sep 2015
Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees
Yudong Chen
Martin J. Wainwright
129
318
0
10 Sep 2015
Cyclical Learning Rates for Training Neural Networks
L. Smith
ODL
152
2,515
0
03 Jun 2015
In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning
Behnam Neyshabur
Ryota Tomioka
Nathan Srebro
AI4CE
86
655
0
20 Dec 2014
Phase Retrieval via Wirtinger Flow: Theory and Algorithms
Emmanuel Candes
Xiaodong Li
Mahdi Soltanolkotabi
166
1,283
0
03 Jul 2014
Understanding Alternating Minimization for Matrix Completion
Moritz Hardt
79
258
0
03 Dec 2013
Matrix Completion from a Few Entries
Raghunandan H. Keshavan
Andrea Montanari
Sewoong Oh
331
1,242
0
20 Jan 2009
1