Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks

10 July 2019

Papers citing "Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks"

50 / 71 papers shown

Title
ICE-Pruning: An Iterative Cost-Efficient Pruning Pipeline for Deep Neural Networks Wenhao Hu Paul Henderson José Cano 32 0 0 12 May 2025
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks Pierfrancesco Beneventano Blake Woodworth MLT 44 1 0 15 Jan 2025
Bias of Stochastic Gradient Descent or the Architecture: Disentangling the Effects of Overparameterization of Neural Networks Amit Peleg Matthias Hein 39 0 0 04 Jul 2024
$Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization$ Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization Shuo Xie Zhiyuan Li OffRL 50 13 0 05 Apr 2024
Tune without Validation: Searching for Learning Rate and Weight Decay on Training Sets Lorenzo Brigato Stavroula Mougiakakou 45 0 0 08 Mar 2024
StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization Shida Wang Qianxiao Li 22 13 0 24 Nov 2023
Large Learning Rates Improve Generalization: But How Large Are We Talking About? E. Lobacheva Eduard Pockonechnyy M. Kodryan Dmitry Vetrov AI4CE 16 0 0 19 Nov 2023
Balance, Imbalance, and Rebalance: Understanding Robust Overfitting from a Minimax Game Perspective Yifei Wang Liangchen Li Jiansheng Yang Zhouchen Lin Yisen Wang 31 11 0 30 Oct 2023
Layer-wise Linear Mode Connectivity Linara Adilova Maksym Andriushchenko Michael Kamp Asja Fischer Martin Jaggi FedML FAtt MoMe 33 15 0 13 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models Jean Kaddour Oscar Key Piotr Nawrot Pasquale Minervini Matt J. Kusner 24 41 0 12 Jul 2023
Loss Spike in Training Neural Networks Zhongwang Zhang Z. Xu 36 5 0 20 May 2023
Learning Trajectories are Generalization Indicators Jingwen Fu Zhizheng Zhang Dacheng Yin Yan Lu Nanning Zheng AI4CE 36 3 0 25 Apr 2023
A Modern Look at the Relationship between Sharpness and Generalization Maksym Andriushchenko Francesco Croce Maximilian Müller Matthias Hein Nicolas Flammarion 3DH 19 56 0 14 Feb 2023
Do Neural Networks Generalize from Self-Averaging Sub-classifiers in the Same Way As Adaptive Boosting? Michael Sun Peter Chatain AI4CE 29 0 0 14 Feb 2023
On a continuous time model of gradient descent dynamics and instability in deep learning Mihaela Rosca Yan Wu Chongli Qin Benoit Dherin 23 7 0 03 Feb 2023
Catapult Dynamics and Phase Transitions in Quadratic Nets David Meltzer Junyu Liu 27 9 0 18 Jan 2023
Beyond spectral gap (extended): The role of the topology in decentralized learning Thijs Vogels Hadrien Hendrikx Martin Jaggi 29 3 0 05 Jan 2023
Learning threshold neurons via the "edge of stability" Kwangjun Ahn Sébastien Bubeck Sinho Chewi Y. Lee Felipe Suarez Yi Zhang MLT 38 36 0 14 Dec 2022
Establishing a stronger baseline for lightweight contrastive models Wenye Lin Yifeng Ding Zhixiong Cao Haitao Zheng 27 2 0 14 Dec 2022
Disentangling the Mechanisms Behind Implicit Regularization in SGD Zachary Novack Simran Kaur Tanya Marwah Saurabh Garg Zachary Chase Lipton FedML 27 2 0 29 Nov 2022
ModelDiff: A Framework for Comparing Learning Algorithms Harshay Shah Sung Min Park Andrew Ilyas A. Madry SyDa 54 26 0 22 Nov 2022
RSC: Accelerating Graph Neural Networks Training via Randomized Sparse Computations Zirui Liu Sheng-Wei Chen Kaixiong Zhou Daochen Zha Xiao Huang Xia Hu 32 15 0 19 Oct 2022
SGD with Large Step Sizes Learns Sparse Features Maksym Andriushchenko Aditya Varre Loucas Pillaud-Vivien Nicolas Flammarion 45 56 0 11 Oct 2022
On skip connections and normalisation layers in deep optimisation L. MacDonald Jack Valmadre Hemanth Saratchandran Simon Lucey ODL 32 1 0 10 Oct 2022
Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty Thomas George Guillaume Lajoie A. Baratin 31 5 0 19 Sep 2022
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent Zhiyuan Li Tianhao Wang Jason D. Lee Sanjeev Arora 45 27 0 08 Jul 2022
When Does Re-initialization Work? Sheheryar Zaidi Tudor Berariu Hyunjik Kim J. Bornschein Claudia Clopath Yee Whye Teh Razvan Pascanu 40 10 0 20 Jun 2022
Beyond spectral gap: The role of the topology in decentralized learning Thijs Vogels Hadrien Hendrikx Martin Jaggi FedML 11 27 0 07 Jun 2022
The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning Zixin Wen Yuanzhi Li SSL 32 34 0 12 May 2022
High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation Jimmy Ba Murat A. Erdogdu Taiji Suzuki Zhichao Wang Denny Wu Greg Yang MLT 42 121 0 03 May 2022
Biologically-inspired neuronal adaptation improves learning in neural networks Yoshimasa Kubo Eric Chalmers Artur Luczak 19 6 0 08 Apr 2022
On the Benefits of Large Learning Rates for Kernel Methods Gaspard Beugnot Julien Mairal Alessandro Rudi 27 11 0 28 Feb 2022
Optimal learning rate schedules in high-dimensional non-convex optimization problems Stéphane dÁscoli Maria Refinetti Giulio Biroli 21 7 0 09 Feb 2022
Weight Expansion: A New Perspective on Dropout and Generalization Gao Jin Xinping Yi Pengfei Yang Lijun Zhang S. Schewe Xiaowei Huang 29 5 0 23 Jan 2022
Partial Model Averaging in Federated Learning: Performance Guarantees and Benefits Sunwoo Lee Anit Kumar Sahu Chaoyang He Salman Avestimehr FedML 33 17 0 11 Jan 2022
DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization Aviral Kumar Rishabh Agarwal Tengyu Ma Aaron Courville George Tucker Sergey Levine OffRL 31 65 0 09 Dec 2021
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect Yuqing Wang Minshuo Chen T. Zhao Molei Tao AI4CE 57 40 0 07 Oct 2021
Stochastic Anderson Mixing for Nonconvex Stochastic Optimization Fu Wei Chenglong Bao Yang Liu 30 19 0 04 Oct 2021
Stochastic Training is Not Necessary for Generalization Jonas Geiping Micah Goldblum Phillip E. Pope Michael Moeller Tom Goldstein 89 72 0 29 Sep 2021
Adaptive Margin Circle Loss for Speaker Verification Runqiu Xiao 30 11 0 15 Jun 2021
Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization Zeke Xie Li-xin Yuan Zhanxing Zhu Masashi Sugiyama 27 29 0 31 Mar 2021
How to decay your learning rate Aitor Lewkowycz 41 24 0 23 Mar 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs) Zhiyuan Li Sadhika Malladi Sanjeev Arora 44 78 0 24 Feb 2021
Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization Tianyi Liu Yan Li S. Wei Enlu Zhou T. Zhao 21 13 0 24 Feb 2021
Open-World Semi-Supervised Learning Kaidi Cao Maria Brbic J. Leskovec BDL 30 178 0 06 Feb 2021
Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise Spencer Frei Yuan Cao Quanquan Gu FedML MLT 70 19 0 04 Jan 2021
FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training Y. Fu Haoran You Yang Katie Zhao Yue Wang Chaojian Li K. Gopalakrishnan Zhangyang Wang Yingyan Lin MQ 38 32 0 24 Dec 2020
Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning Zeyuan Allen-Zhu Yuanzhi Li FedML 60 356 0 17 Dec 2020
Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent Kangqiao Liu Liu Ziyin Masakuni Ueda MLT 61 37 0 07 Dec 2020
A Random Matrix Theory Approach to Damping in Deep Learning Diego Granziol Nicholas P. Baskerville AI4CE ODL 29 2 0 15 Nov 2020