Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1907.04595
Cited By
Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks
10 July 2019
Yuanzhi Li
Colin Wei
Tengyu Ma
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks"
50 / 71 papers shown
Title
ICE-Pruning: An Iterative Cost-Efficient Pruning Pipeline for Deep Neural Networks
Wenhao Hu
Paul Henderson
José Cano
32
0
0
12 May 2025
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
Pierfrancesco Beneventano
Blake Woodworth
MLT
44
1
0
15 Jan 2025
Bias of Stochastic Gradient Descent or the Architecture: Disentangling the Effects of Overparameterization of Neural Networks
Amit Peleg
Matthias Hein
39
0
0
04 Jul 2024
Implicit Bias of AdamW:
ℓ
∞
\ell_\infty
ℓ
∞
Norm Constrained Optimization
Shuo Xie
Zhiyuan Li
OffRL
50
13
0
05 Apr 2024
Tune without Validation: Searching for Learning Rate and Weight Decay on Training Sets
Lorenzo Brigato
Stavroula Mougiakakou
45
0
0
08 Mar 2024
StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization
Shida Wang
Qianxiao Li
22
13
0
24 Nov 2023
Large Learning Rates Improve Generalization: But How Large Are We Talking About?
E. Lobacheva
Eduard Pockonechnyy
M. Kodryan
Dmitry Vetrov
AI4CE
16
0
0
19 Nov 2023
Balance, Imbalance, and Rebalance: Understanding Robust Overfitting from a Minimax Game Perspective
Yifei Wang
Liangchen Li
Jiansheng Yang
Zhouchen Lin
Yisen Wang
31
11
0
30 Oct 2023
Layer-wise Linear Mode Connectivity
Linara Adilova
Maksym Andriushchenko
Michael Kamp
Asja Fischer
Martin Jaggi
FedML
FAtt
MoMe
33
15
0
13 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
24
41
0
12 Jul 2023
Loss Spike in Training Neural Networks
Zhongwang Zhang
Z. Xu
36
5
0
20 May 2023
Learning Trajectories are Generalization Indicators
Jingwen Fu
Zhizheng Zhang
Dacheng Yin
Yan Lu
Nanning Zheng
AI4CE
36
3
0
25 Apr 2023
A Modern Look at the Relationship between Sharpness and Generalization
Maksym Andriushchenko
Francesco Croce
Maximilian Müller
Matthias Hein
Nicolas Flammarion
3DH
19
56
0
14 Feb 2023
Do Neural Networks Generalize from Self-Averaging Sub-classifiers in the Same Way As Adaptive Boosting?
Michael Sun
Peter Chatain
AI4CE
29
0
0
14 Feb 2023
On a continuous time model of gradient descent dynamics and instability in deep learning
Mihaela Rosca
Yan Wu
Chongli Qin
Benoit Dherin
23
7
0
03 Feb 2023
Catapult Dynamics and Phase Transitions in Quadratic Nets
David Meltzer
Junyu Liu
27
9
0
18 Jan 2023
Beyond spectral gap (extended): The role of the topology in decentralized learning
Thijs Vogels
Hadrien Hendrikx
Martin Jaggi
29
3
0
05 Jan 2023
Learning threshold neurons via the "edge of stability"
Kwangjun Ahn
Sébastien Bubeck
Sinho Chewi
Y. Lee
Felipe Suarez
Yi Zhang
MLT
38
36
0
14 Dec 2022
Establishing a stronger baseline for lightweight contrastive models
Wenye Lin
Yifeng Ding
Zhixiong Cao
Haitao Zheng
27
2
0
14 Dec 2022
Disentangling the Mechanisms Behind Implicit Regularization in SGD
Zachary Novack
Simran Kaur
Tanya Marwah
Saurabh Garg
Zachary Chase Lipton
FedML
27
2
0
29 Nov 2022
ModelDiff: A Framework for Comparing Learning Algorithms
Harshay Shah
Sung Min Park
Andrew Ilyas
A. Madry
SyDa
54
26
0
22 Nov 2022
RSC: Accelerating Graph Neural Networks Training via Randomized Sparse Computations
Zirui Liu
Sheng-Wei Chen
Kaixiong Zhou
Daochen Zha
Xiao Huang
Xia Hu
32
15
0
19 Oct 2022
SGD with Large Step Sizes Learns Sparse Features
Maksym Andriushchenko
Aditya Varre
Loucas Pillaud-Vivien
Nicolas Flammarion
45
56
0
11 Oct 2022
On skip connections and normalisation layers in deep optimisation
L. MacDonald
Jack Valmadre
Hemanth Saratchandran
Simon Lucey
ODL
32
1
0
10 Oct 2022
Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty
Thomas George
Guillaume Lajoie
A. Baratin
31
5
0
19 Sep 2022
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent
Zhiyuan Li
Tianhao Wang
Jason D. Lee
Sanjeev Arora
45
27
0
08 Jul 2022
When Does Re-initialization Work?
Sheheryar Zaidi
Tudor Berariu
Hyunjik Kim
J. Bornschein
Claudia Clopath
Yee Whye Teh
Razvan Pascanu
40
10
0
20 Jun 2022
Beyond spectral gap: The role of the topology in decentralized learning
Thijs Vogels
Hadrien Hendrikx
Martin Jaggi
FedML
11
27
0
07 Jun 2022
The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning
Zixin Wen
Yuanzhi Li
SSL
32
34
0
12 May 2022
High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation
Jimmy Ba
Murat A. Erdogdu
Taiji Suzuki
Zhichao Wang
Denny Wu
Greg Yang
MLT
42
121
0
03 May 2022
Biologically-inspired neuronal adaptation improves learning in neural networks
Yoshimasa Kubo
Eric Chalmers
Artur Luczak
19
6
0
08 Apr 2022
On the Benefits of Large Learning Rates for Kernel Methods
Gaspard Beugnot
Julien Mairal
Alessandro Rudi
27
11
0
28 Feb 2022
Optimal learning rate schedules in high-dimensional non-convex optimization problems
Stéphane dÁscoli
Maria Refinetti
Giulio Biroli
21
7
0
09 Feb 2022
Weight Expansion: A New Perspective on Dropout and Generalization
Gao Jin
Xinping Yi
Pengfei Yang
Lijun Zhang
S. Schewe
Xiaowei Huang
29
5
0
23 Jan 2022
Partial Model Averaging in Federated Learning: Performance Guarantees and Benefits
Sunwoo Lee
Anit Kumar Sahu
Chaoyang He
Salman Avestimehr
FedML
33
17
0
11 Jan 2022
DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization
Aviral Kumar
Rishabh Agarwal
Tengyu Ma
Aaron Courville
George Tucker
Sergey Levine
OffRL
31
65
0
09 Dec 2021
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect
Yuqing Wang
Minshuo Chen
T. Zhao
Molei Tao
AI4CE
57
40
0
07 Oct 2021
Stochastic Anderson Mixing for Nonconvex Stochastic Optimization
Fu Wei
Chenglong Bao
Yang Liu
30
19
0
04 Oct 2021
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
89
72
0
29 Sep 2021
Adaptive Margin Circle Loss for Speaker Verification
Runqiu Xiao
30
11
0
15 Jun 2021
Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization
Zeke Xie
Li-xin Yuan
Zhanxing Zhu
Masashi Sugiyama
27
29
0
31 Mar 2021
How to decay your learning rate
Aitor Lewkowycz
41
24
0
23 Mar 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
Zhiyuan Li
Sadhika Malladi
Sanjeev Arora
44
78
0
24 Feb 2021
Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization
Tianyi Liu
Yan Li
S. Wei
Enlu Zhou
T. Zhao
21
13
0
24 Feb 2021
Open-World Semi-Supervised Learning
Kaidi Cao
Maria Brbic
J. Leskovec
BDL
30
178
0
06 Feb 2021
Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise
Spencer Frei
Yuan Cao
Quanquan Gu
FedML
MLT
70
19
0
04 Jan 2021
FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training
Y. Fu
Haoran You
Yang Katie Zhao
Yue Wang
Chaojian Li
K. Gopalakrishnan
Zhangyang Wang
Yingyan Lin
MQ
38
32
0
24 Dec 2020
Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning
Zeyuan Allen-Zhu
Yuanzhi Li
FedML
60
356
0
17 Dec 2020
Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent
Kangqiao Liu
Liu Ziyin
Masakuni Ueda
MLT
61
37
0
07 Dec 2020
A Random Matrix Theory Approach to Damping in Deep Learning
Diego Granziol
Nicholas P. Baskerville
AI4CE
ODL
29
2
0
15 Nov 2020
1
2
Next