ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.07454
  4. Cited By
An Exponential Learning Rate Schedule for Deep Learning

An Exponential Learning Rate Schedule for Deep Learning

16 October 2019
Zhiyuan Li
Sanjeev Arora
ArXivPDFHTML

Papers citing "An Exponential Learning Rate Schedule for Deep Learning"

41 / 41 papers shown
Title
Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training
Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training
Shane Bergsma
Nolan Dey
Gurpreet Gosal
Gavia Gray
Daria Soboleva
Joel Hestness
24
0
0
19 May 2025
A Unified Framework for Neural Computation and Learning Over Time
A Unified Framework for Neural Computation and Learning Over Time
S. Melacci
Alessandro Betti
Michele Casoni
Tommaso Guidi
Matteo Tiezzi
Marco Gori
AI4TS
35
0
0
18 Sep 2024
Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning
Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning
Amin Karimi Monsefi
Mengxi Zhou
Nastaran Karimi Monsefi
Ser-Nam Lim
Wei-Lun Chao
R. Ramnath
52
1
0
16 Sep 2024
Normalization and effective learning rates in reinforcement learning
Normalization and effective learning rates in reinforcement learning
Clare Lyle
Zeyu Zheng
Khimya Khetarpal
James Martens
H. V. Hasselt
Razvan Pascanu
Will Dabney
26
7
0
01 Jul 2024
How to set AdamW's weight decay as you scale model and dataset size
How to set AdamW's weight decay as you scale model and dataset size
Xi Wang
Laurence Aitchison
51
10
0
22 May 2024
Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization
Implicit Bias of AdamW: ℓ∞\ell_\inftyℓ∞​ Norm Constrained Optimization
Shuo Xie
Zhiyuan Li
OffRL
55
13
0
05 Apr 2024
NTK-Guided Few-Shot Class Incremental Learning
NTK-Guided Few-Shot Class Incremental Learning
Jingren Liu
Zhong Ji
Yanwei Pang
YunLong Yu
CLL
44
3
0
19 Mar 2024
Analyzing and Improving the Training Dynamics of Diffusion Models
Analyzing and Improving the Training Dynamics of Diffusion Models
Tero Karras
M. Aittala
J. Lehtinen
Janne Hellsten
Timo Aila
S. Laine
61
158
0
05 Dec 2023
D4Explainer: In-Distribution GNN Explanations via Discrete Denoising
  Diffusion
D4Explainer: In-Distribution GNN Explanations via Discrete Denoising Diffusion
Jialin Chen
Shirley Wu
Abhijit Gupta
Rex Ying
DiffM
42
5
0
30 Oct 2023
IndoHerb: Indonesia Medicinal Plants Recognition using Transfer Learning
  and Deep Learning
IndoHerb: Indonesia Medicinal Plants Recognition using Transfer Learning and Deep Learning
Muhammad Salman Ikrar Musyaffa
N. Yudistira
Muhammad Arif Rahman
Jati Batoro
21
2
0
03 Aug 2023
On the Weight Dynamics of Deep Normalized Networks
On the Weight Dynamics of Deep Normalized Networks
Christian H. X. Ali Mehmeti-Göpel
Michael Wand
40
1
0
01 Jun 2023
Generating Adversarial Attacks in the Latent Space
Generating Adversarial Attacks in the Latent Space
Nitish Shukla
Sudipta Banerjee
36
8
0
10 Apr 2023
Learning Rate Schedules in the Presence of Distribution Shift
Learning Rate Schedules in the Presence of Distribution Shift
Matthew Fahrbach
Adel Javanmard
Vahab Mirrokni
Pratik Worah
29
6
0
27 Mar 2023
Convolutional neural networks for medical image segmentation
Convolutional neural networks for medical image segmentation
J. Bertels
D. Robben
Robin Lemmens
Dirk Vandermeulen
SSeg
15
2
0
17 Nov 2022
Toward Equation of Motion for Deep Neural Networks: Continuous-time
  Gradient Descent and Discretization Error Analysis
Toward Equation of Motion for Deep Neural Networks: Continuous-time Gradient Descent and Discretization Error Analysis
Taiki Miyagawa
55
9
0
28 Oct 2022
SGD with Large Step Sizes Learns Sparse Features
SGD with Large Step Sizes Learns Sparse Features
Maksym Andriushchenko
Aditya Varre
Loucas Pillaud-Vivien
Nicolas Flammarion
50
56
0
11 Oct 2022
Learning to Drop Out: An Adversarial Approach to Training Sequence VAEs
Learning to Drop Out: An Adversarial Approach to Training Sequence VAEs
Ðorðe Miladinovic
Kumar Shridhar
Kushal Kumar Jain
Max B. Paulus
J. M. Buhmann
Mrinmaya Sachan
Carl Allen
DRL
33
5
0
26 Sep 2022
Learn From All: Erasing Attention Consistency for Noisy Label Facial
  Expression Recognition
Learn From All: Erasing Attention Consistency for Noisy Label Facial Expression Recognition
Yuhang Zhang
Chengrui Wang
Xu Ling
Weihong Deng
47
136
0
21 Jul 2022
When Does Re-initialization Work?
When Does Re-initialization Work?
Sheheryar Zaidi
Tudor Berariu
Hyunjik Kim
J. Bornschein
Claudia Clopath
Yee Whye Teh
Razvan Pascanu
40
10
0
20 Jun 2022
Understanding the Generalization Benefit of Normalization Layers:
  Sharpness Reduction
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
54
71
0
14 Jun 2022
Adaptive Gradient Methods with Local Guarantees
Adaptive Gradient Methods with Local Guarantees
Zhou Lu
Wenhan Xia
Sanjeev Arora
Elad Hazan
ODL
32
9
0
02 Mar 2022
Robust Training of Neural Networks Using Scale Invariant Architectures
Robust Training of Neural Networks Using Scale Invariant Architectures
Zhiyuan Li
Srinadh Bhojanapalli
Manzil Zaheer
Sashank J. Reddi
Surinder Kumar
29
27
0
02 Feb 2022
A Theoretical View of Linear Backpropagation and Its Convergence
A Theoretical View of Linear Backpropagation and Its Convergence
Ziang Li
Yiwen Guo
Haodi Liu
Changshui Zhang
AAML
26
3
0
21 Dec 2021
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect
Yuqing Wang
Minshuo Chen
T. Zhao
Molei Tao
AI4CE
64
40
0
07 Oct 2021
Stochastic Training is Not Necessary for Generalization
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
91
72
0
29 Sep 2021
How to decay your learning rate
How to decay your learning rate
Aitor Lewkowycz
51
24
0
23 Mar 2021
On the Validity of Modeling SGD with Stochastic Differential Equations
  (SDEs)
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
Zhiyuan Li
Sadhika Malladi
Sanjeev Arora
49
78
0
24 Feb 2021
Formal Language Theory Meets Modern NLP
Formal Language Theory Meets Modern NLP
William Merrill
AI4CE
NAI
26
12
0
19 Feb 2021
Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning
  Dynamics
Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics
D. Kunin
Javier Sagastuy-Breña
Surya Ganguli
Daniel L. K. Yamins
Hidenori Tanaka
107
77
0
08 Dec 2020
Reverse engineering learned optimizers reveals known and novel
  mechanisms
Reverse engineering learned optimizers reveals known and novel mechanisms
Niru Maheswaranathan
David Sussillo
Luke Metz
Ruoxi Sun
Jascha Narain Sohl-Dickstein
24
21
0
04 Nov 2020
GraphNorm: A Principled Approach to Accelerating Graph Neural Network
  Training
GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training
Tianle Cai
Shengjie Luo
Keyulu Xu
Di He
Tie-Yan Liu
Liwei Wang
GNN
32
160
0
07 Sep 2020
Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge
Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge
Chaoyang He
M. Annavaram
A. Avestimehr
FedML
32
23
0
28 Jul 2020
On the training dynamics of deep networks with $L_2$ regularization
On the training dynamics of deep networks with L2L_2L2​ regularization
Aitor Lewkowycz
Guy Gur-Ari
46
53
0
15 Jun 2020
Understanding the Role of Training Regimes in Continual Learning
Understanding the Role of Training Regimes in Continual Learning
Seyed Iman Mirzadeh
Mehrdad Farajtabar
Razvan Pascanu
H. Ghasemzadeh
CLL
21
219
0
12 Jun 2020
Few-shot Neural Architecture Search
Few-shot Neural Architecture Search
Yiyang Zhao
Linnan Wang
Yuandong Tian
Rodrigo Fonseca
Tian Guo
30
90
0
11 Jun 2020
Angle-based Search Space Shrinking for Neural Architecture Search
Angle-based Search Space Shrinking for Neural Architecture Search
Yiming Hu
Yuding Liang
Zichao Guo
Ruosi Wan
Xinming Zhang
Yichen Wei
Qingyi Gu
Jian Sun
24
62
0
28 Apr 2020
On Learning Rates and Schrödinger Operators
On Learning Rates and Schrödinger Operators
Bin Shi
Weijie J. Su
Michael I. Jordan
34
60
0
15 Apr 2020
Evolving Normalization-Activation Layers
Evolving Normalization-Activation Layers
Hanxiao Liu
Andrew Brock
Karen Simonyan
Quoc V. Le
25
79
0
06 Apr 2020
The Two Regimes of Deep Network Training
The Two Regimes of Deep Network Training
Guillaume Leclerc
Aleksander Madry
27
45
0
24 Feb 2020
Big Transfer (BiT): General Visual Representation Learning
Big Transfer (BiT): General Visual Representation Learning
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
J. Puigcerver
Jessica Yung
Sylvain Gelly
N. Houlsby
MQ
114
1,183
0
24 Dec 2019
Linear Mode Connectivity and the Lottery Ticket Hypothesis
Linear Mode Connectivity and the Lottery Ticket Hypothesis
Jonathan Frankle
Gintare Karolina Dziugaite
Daniel M. Roy
Michael Carbin
MoMe
43
601
0
11 Dec 2019
1