ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.03677
  4. Cited By
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect

Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect

7 October 2021
Yuqing Wang
Minshuo Chen
T. Zhao
Molei Tao
    AI4CE
ArXivPDFHTML

Papers citing "Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect"

35 / 35 papers shown
Title
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
Ruiqi Zhang
Jingfeng Wu
Licong Lin
Peter L. Bartlett
51
0
0
05 Apr 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Dayal Singh Kalra
Tianyu He
M. Barkeshli
95
4
0
17 Feb 2025
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
Pierfrancesco Beneventano
Blake Woodworth
MLT
77
1
0
15 Jan 2025
Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably
Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably
Tianyi Liu
Yan Li
Enlu Zhou
Tuo Zhao
53
1
0
07 Feb 2022
Implicit Regularization of Bregman Proximal Point Algorithm and Mirror
  Descent on Separable Data
Implicit Regularization of Bregman Proximal Point Algorithm and Mirror Descent on Separable Data
Yan Li
Caleb Ju
Ethan X. Fang
T. Zhao
32
9
0
15 Aug 2021
Global Convergence of Gradient Descent for Asymmetric Low-Rank Matrix
  Factorization
Global Convergence of Gradient Descent for Asymmetric Low-Rank Matrix Factorization
Tian-Chun Ye
S. Du
41
47
0
27 Jun 2021
Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix
  Factorization
Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization
Tianyi Liu
Yan Li
S. Wei
Enlu Zhou
T. Zhao
38
13
0
24 Feb 2021
Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM
  in Deep Learning
Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning
Pan Zhou
Jiashi Feng
Chao Ma
Caiming Xiong
Guosheng Lin
E. Weinan
71
231
0
12 Oct 2020
A Comparison of Optimization Algorithms for Deep Learning
A Comparison of Optimization Algorithms for Deep Learning
Derya Soydaner
105
155
0
28 Jul 2020
Learning Rate Annealing Can Provably Help Generalization, Even for
  Convex Problems
Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems
Preetum Nakkiran
MLT
44
21
0
15 May 2020
The large learning rate phase of deep learning: the catapult mechanism
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
182
238
0
04 Mar 2020
The Landscape of Matrix Factorization Revisited
The Landscape of Matrix Factorization Revisited
Hossein Valavi
Sulin Liu
Peter J. Ramadge
52
5
0
27 Feb 2020
Stochasticity of Deterministic Gradient Descent: Large Learning Rate for
  Multiscale Objective Function
Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function
Lingkai Kong
Molei Tao
33
22
0
14 Feb 2020
An Exponential Learning Rate Schedule for Deep Learning
An Exponential Learning Rate Schedule for Deep Learning
Zhiyuan Li
Sanjeev Arora
42
214
0
16 Oct 2019
On the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and Beyond
Liyuan Liu
Haoming Jiang
Pengcheng He
Weizhu Chen
Xiaodong Liu
Jianfeng Gao
Jiawei Han
ODL
198
1,894
0
08 Aug 2019
Towards Explaining the Regularization Effect of Initial Large Learning
  Rate in Training Neural Networks
Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks
Yuanzhi Li
Colin Wei
Tengyu Ma
42
295
0
10 Jul 2019
A Closer Look at Deep Learning Heuristics: Learning rate restarts,
  Warmup and Distillation
A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation
Akhilesh Deepak Gotmare
N. Keskar
Caiming Xiong
R. Socher
ODL
57
275
0
29 Oct 2018
Gradient descent aligns the layers of deep linear networks
Gradient descent aligns the layers of deep linear networks
Ziwei Ji
Matus Telgarsky
111
250
0
04 Oct 2018
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Arthur Jacot
Franck Gabriel
Clément Hongler
213
3,160
0
20 Jun 2018
On Landscape of Lagrangian Functions and Stochastic Search for
  Constrained Nonconvex Optimization
On Landscape of Lagrangian Functions and Stochastic Search for Constrained Nonconvex Optimization
Zhehui Chen
Xingguo Li
Lin F. Yang
Jarvis Haupt
T. Zhao
21
4
0
13 Jun 2018
Algorithmic Regularization in Learning Deep Homogeneous Models: Layers
  are Automatically Balanced
Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced
S. Du
Wei Hu
Jason D. Lee
MLT
121
239
0
04 Jun 2018
A disciplined approach to neural network hyper-parameters: Part 1 --
  learning rate, batch size, momentum, and weight decay
A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay
L. Smith
271
1,026
0
26 Mar 2018
Three Factors Influencing Minima in SGD
Three Factors Influencing Minima in SGD
Stanislaw Jastrzebski
Zachary Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
76
459
0
13 Nov 2017
The Implicit Bias of Gradient Descent on Separable Data
The Implicit Bias of Gradient Descent on Separable Data
Daniel Soudry
Elad Hoffer
Mor Shpigel Nacson
Suriya Gunasekar
Nathan Srebro
115
908
0
27 Oct 2017
Implicit Regularization in Matrix Factorization
Implicit Regularization in Matrix Factorization
Suriya Gunasekar
Blake E. Woodworth
Srinadh Bhojanapalli
Behnam Neyshabur
Nathan Srebro
75
490
0
25 May 2017
No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified
  Geometric Analysis
No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis
Rong Ge
Chi Jin
Yi Zheng
113
435
0
03 Apr 2017
Optimization Methods for Large-Scale Machine Learning
Optimization Methods for Large-Scale Machine Learning
Léon Bottou
Frank E. Curtis
J. Nocedal
195
3,198
0
15 Jun 2016
A Geometric Analysis of Phase Retrieval
A Geometric Analysis of Phase Retrieval
Ju Sun
Qing Qu
John N. Wright
86
524
0
22 Feb 2016
Dropping Convexity for Faster Semi-definite Optimization
Dropping Convexity for Faster Semi-definite Optimization
Srinadh Bhojanapalli
Anastasios Kyrillidis
Sujay Sanghavi
63
173
0
14 Sep 2015
Fast low-rank estimation by projected gradient descent: General
  statistical and algorithmic guarantees
Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees
Yudong Chen
Martin J. Wainwright
129
318
0
10 Sep 2015
Cyclical Learning Rates for Training Neural Networks
Cyclical Learning Rates for Training Neural Networks
L. Smith
ODL
152
2,515
0
03 Jun 2015
In Search of the Real Inductive Bias: On the Role of Implicit
  Regularization in Deep Learning
In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning
Behnam Neyshabur
Ryota Tomioka
Nathan Srebro
AI4CE
86
655
0
20 Dec 2014
Phase Retrieval via Wirtinger Flow: Theory and Algorithms
Phase Retrieval via Wirtinger Flow: Theory and Algorithms
Emmanuel Candes
Xiaodong Li
Mahdi Soltanolkotabi
166
1,283
0
03 Jul 2014
Understanding Alternating Minimization for Matrix Completion
Understanding Alternating Minimization for Matrix Completion
Moritz Hardt
79
258
0
03 Dec 2013
Matrix Completion from a Few Entries
Matrix Completion from a Few Entries
Raghunandan H. Keshavan
Andrea Montanari
Sewoong Oh
331
1,242
0
20 Jan 2009
1