An Exponential Learning Rate Schedule for Deep Learning

16 October 2019

Papers citing "An Exponential Learning Rate Schedule for Deep Learning"

41 / 41 papers shown

Title
Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training Shane Bergsma Nolan Dey Gurpreet Gosal Gavia Gray Daria Soboleva Joel Hestness 24 0 0 19 May 2025
A Unified Framework for Neural Computation and Learning Over Time S. Melacci Alessandro Betti Michele Casoni Tommaso Guidi Matteo Tiezzi Marco Gori AI4TS 35 0 0 18 Sep 2024
Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning Amin Karimi Monsefi Mengxi Zhou Nastaran Karimi Monsefi Ser-Nam Lim Wei-Lun Chao R. Ramnath 52 1 0 16 Sep 2024
Normalization and effective learning rates in reinforcement learning Clare Lyle Zeyu Zheng Khimya Khetarpal James Martens H. V. Hasselt Razvan Pascanu Will Dabney 26 7 0 01 Jul 2024
How to set AdamW's weight decay as you scale model and dataset size Xi Wang Laurence Aitchison 51 10 0 22 May 2024
$Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization$ Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization Shuo Xie Zhiyuan Li OffRL 55 13 0 05 Apr 2024
NTK-Guided Few-Shot Class Incremental Learning Jingren Liu Zhong Ji Yanwei Pang YunLong Yu CLL 44 3 0 19 Mar 2024
Analyzing and Improving the Training Dynamics of Diffusion Models Tero Karras M. Aittala J. Lehtinen Janne Hellsten Timo Aila S. Laine 61 158 0 05 Dec 2023
D4Explainer: In-Distribution GNN Explanations via Discrete Denoising Diffusion Jialin Chen Shirley Wu Abhijit Gupta Rex Ying DiffM 42 5 0 30 Oct 2023
IndoHerb: Indonesia Medicinal Plants Recognition using Transfer Learning and Deep Learning Muhammad Salman Ikrar Musyaffa N. Yudistira Muhammad Arif Rahman Jati Batoro 21 2 0 03 Aug 2023
On the Weight Dynamics of Deep Normalized Networks Christian H. X. Ali Mehmeti-Göpel Michael Wand 40 1 0 01 Jun 2023
Generating Adversarial Attacks in the Latent Space Nitish Shukla Sudipta Banerjee 36 8 0 10 Apr 2023
Learning Rate Schedules in the Presence of Distribution Shift Matthew Fahrbach Adel Javanmard Vahab Mirrokni Pratik Worah 29 6 0 27 Mar 2023
Convolutional neural networks for medical image segmentation J. Bertels D. Robben Robin Lemmens Dirk Vandermeulen SSeg 15 2 0 17 Nov 2022
Toward Equation of Motion for Deep Neural Networks: Continuous-time Gradient Descent and Discretization Error Analysis Taiki Miyagawa 55 9 0 28 Oct 2022
SGD with Large Step Sizes Learns Sparse Features Maksym Andriushchenko Aditya Varre Loucas Pillaud-Vivien Nicolas Flammarion 50 56 0 11 Oct 2022
Learning to Drop Out: An Adversarial Approach to Training Sequence VAEs Ðorðe Miladinovic Kumar Shridhar Kushal Kumar Jain Max B. Paulus J. M. Buhmann Mrinmaya Sachan Carl Allen DRL 33 5 0 26 Sep 2022
Learn From All: Erasing Attention Consistency for Noisy Label Facial Expression Recognition Yuhang Zhang Chengrui Wang Xu Ling Weihong Deng 47 136 0 21 Jul 2022
When Does Re-initialization Work? Sheheryar Zaidi Tudor Berariu Hyunjik Kim J. Bornschein Claudia Clopath Yee Whye Teh Razvan Pascanu 40 10 0 20 Jun 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction Kaifeng Lyu Zhiyuan Li Sanjeev Arora FAtt 54 71 0 14 Jun 2022
Adaptive Gradient Methods with Local Guarantees Zhou Lu Wenhan Xia Sanjeev Arora Elad Hazan ODL 32 9 0 02 Mar 2022
Robust Training of Neural Networks Using Scale Invariant Architectures Zhiyuan Li Srinadh Bhojanapalli Manzil Zaheer Sashank J. Reddi Surinder Kumar 29 27 0 02 Feb 2022
A Theoretical View of Linear Backpropagation and Its Convergence Ziang Li Yiwen Guo Haodi Liu Changshui Zhang AAML 26 3 0 21 Dec 2021
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect Yuqing Wang Minshuo Chen T. Zhao Molei Tao AI4CE 64 40 0 07 Oct 2021
Stochastic Training is Not Necessary for Generalization Jonas Geiping Micah Goldblum Phillip E. Pope Michael Moeller Tom Goldstein 91 72 0 29 Sep 2021
How to decay your learning rate Aitor Lewkowycz 51 24 0 23 Mar 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs) Zhiyuan Li Sadhika Malladi Sanjeev Arora 49 78 0 24 Feb 2021
Formal Language Theory Meets Modern NLP William Merrill AI4CE NAI 26 12 0 19 Feb 2021
Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics D. Kunin Javier Sagastuy-Breña Surya Ganguli Daniel L. K. Yamins Hidenori Tanaka 107 77 0 08 Dec 2020
Reverse engineering learned optimizers reveals known and novel mechanisms Niru Maheswaranathan David Sussillo Luke Metz Ruoxi Sun Jascha Narain Sohl-Dickstein 24 21 0 04 Nov 2020
GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training Tianle Cai Shengjie Luo Keyulu Xu Di He Tie-Yan Liu Liwei Wang GNN 32 160 0 07 Sep 2020
Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge Chaoyang He M. Annavaram A. Avestimehr FedML 32 23 0 28 Jul 2020
On the training dynamics of deep networks with $L_2$ regularization Aitor Lewkowycz Guy Gur-Ari 46 53 0 15 Jun 2020
Understanding the Role of Training Regimes in Continual Learning Seyed Iman Mirzadeh Mehrdad Farajtabar Razvan Pascanu H. Ghasemzadeh CLL 21 219 0 12 Jun 2020
Few-shot Neural Architecture Search Yiyang Zhao Linnan Wang Yuandong Tian Rodrigo Fonseca Tian Guo 30 90 0 11 Jun 2020
Angle-based Search Space Shrinking for Neural Architecture Search Yiming Hu Yuding Liang Zichao Guo Ruosi Wan Xinming Zhang Yichen Wei Qingyi Gu Jian Sun 24 62 0 28 Apr 2020
On Learning Rates and Schrödinger Operators Bin Shi Weijie J. Su Michael I. Jordan 34 60 0 15 Apr 2020
Evolving Normalization-Activation Layers Hanxiao Liu Andrew Brock Karen Simonyan Quoc V. Le 25 79 0 06 Apr 2020
The Two Regimes of Deep Network Training Guillaume Leclerc Aleksander Madry 27 45 0 24 Feb 2020
Big Transfer (BiT): General Visual Representation Learning Alexander Kolesnikov Lucas Beyer Xiaohua Zhai J. Puigcerver Jessica Yung Sylvain Gelly N. Houlsby MQ 114 1,183 0 24 Dec 2019
Linear Mode Connectivity and the Lottery Ticket Hypothesis Jonathan Frankle Gintare Karolina Dziugaite Daniel M. Roy Michael Carbin MoMe 43 601 0 11 Dec 2019