Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1910.07454
Cited By
An Exponential Learning Rate Schedule for Deep Learning
16 October 2019
Zhiyuan Li
Sanjeev Arora
Re-assign community
ArXiv
PDF
HTML
Papers citing
"An Exponential Learning Rate Schedule for Deep Learning"
41 / 41 papers shown
Title
Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training
Shane Bergsma
Nolan Dey
Gurpreet Gosal
Gavia Gray
Daria Soboleva
Joel Hestness
24
0
0
19 May 2025
A Unified Framework for Neural Computation and Learning Over Time
S. Melacci
Alessandro Betti
Michele Casoni
Tommaso Guidi
Matteo Tiezzi
Marco Gori
AI4TS
35
0
0
18 Sep 2024
Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning
Amin Karimi Monsefi
Mengxi Zhou
Nastaran Karimi Monsefi
Ser-Nam Lim
Wei-Lun Chao
R. Ramnath
52
1
0
16 Sep 2024
Normalization and effective learning rates in reinforcement learning
Clare Lyle
Zeyu Zheng
Khimya Khetarpal
James Martens
H. V. Hasselt
Razvan Pascanu
Will Dabney
26
7
0
01 Jul 2024
How to set AdamW's weight decay as you scale model and dataset size
Xi Wang
Laurence Aitchison
51
10
0
22 May 2024
Implicit Bias of AdamW:
ℓ
∞
\ell_\infty
ℓ
∞
Norm Constrained Optimization
Shuo Xie
Zhiyuan Li
OffRL
55
13
0
05 Apr 2024
NTK-Guided Few-Shot Class Incremental Learning
Jingren Liu
Zhong Ji
Yanwei Pang
YunLong Yu
CLL
44
3
0
19 Mar 2024
Analyzing and Improving the Training Dynamics of Diffusion Models
Tero Karras
M. Aittala
J. Lehtinen
Janne Hellsten
Timo Aila
S. Laine
61
158
0
05 Dec 2023
D4Explainer: In-Distribution GNN Explanations via Discrete Denoising Diffusion
Jialin Chen
Shirley Wu
Abhijit Gupta
Rex Ying
DiffM
42
5
0
30 Oct 2023
IndoHerb: Indonesia Medicinal Plants Recognition using Transfer Learning and Deep Learning
Muhammad Salman Ikrar Musyaffa
N. Yudistira
Muhammad Arif Rahman
Jati Batoro
21
2
0
03 Aug 2023
On the Weight Dynamics of Deep Normalized Networks
Christian H. X. Ali Mehmeti-Göpel
Michael Wand
40
1
0
01 Jun 2023
Generating Adversarial Attacks in the Latent Space
Nitish Shukla
Sudipta Banerjee
36
8
0
10 Apr 2023
Learning Rate Schedules in the Presence of Distribution Shift
Matthew Fahrbach
Adel Javanmard
Vahab Mirrokni
Pratik Worah
29
6
0
27 Mar 2023
Convolutional neural networks for medical image segmentation
J. Bertels
D. Robben
Robin Lemmens
Dirk Vandermeulen
SSeg
15
2
0
17 Nov 2022
Toward Equation of Motion for Deep Neural Networks: Continuous-time Gradient Descent and Discretization Error Analysis
Taiki Miyagawa
55
9
0
28 Oct 2022
SGD with Large Step Sizes Learns Sparse Features
Maksym Andriushchenko
Aditya Varre
Loucas Pillaud-Vivien
Nicolas Flammarion
50
56
0
11 Oct 2022
Learning to Drop Out: An Adversarial Approach to Training Sequence VAEs
Ðorðe Miladinovic
Kumar Shridhar
Kushal Kumar Jain
Max B. Paulus
J. M. Buhmann
Mrinmaya Sachan
Carl Allen
DRL
33
5
0
26 Sep 2022
Learn From All: Erasing Attention Consistency for Noisy Label Facial Expression Recognition
Yuhang Zhang
Chengrui Wang
Xu Ling
Weihong Deng
47
136
0
21 Jul 2022
When Does Re-initialization Work?
Sheheryar Zaidi
Tudor Berariu
Hyunjik Kim
J. Bornschein
Claudia Clopath
Yee Whye Teh
Razvan Pascanu
40
10
0
20 Jun 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
54
71
0
14 Jun 2022
Adaptive Gradient Methods with Local Guarantees
Zhou Lu
Wenhan Xia
Sanjeev Arora
Elad Hazan
ODL
32
9
0
02 Mar 2022
Robust Training of Neural Networks Using Scale Invariant Architectures
Zhiyuan Li
Srinadh Bhojanapalli
Manzil Zaheer
Sashank J. Reddi
Surinder Kumar
29
27
0
02 Feb 2022
A Theoretical View of Linear Backpropagation and Its Convergence
Ziang Li
Yiwen Guo
Haodi Liu
Changshui Zhang
AAML
26
3
0
21 Dec 2021
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect
Yuqing Wang
Minshuo Chen
T. Zhao
Molei Tao
AI4CE
64
40
0
07 Oct 2021
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
91
72
0
29 Sep 2021
How to decay your learning rate
Aitor Lewkowycz
51
24
0
23 Mar 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
Zhiyuan Li
Sadhika Malladi
Sanjeev Arora
49
78
0
24 Feb 2021
Formal Language Theory Meets Modern NLP
William Merrill
AI4CE
NAI
26
12
0
19 Feb 2021
Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics
D. Kunin
Javier Sagastuy-Breña
Surya Ganguli
Daniel L. K. Yamins
Hidenori Tanaka
107
77
0
08 Dec 2020
Reverse engineering learned optimizers reveals known and novel mechanisms
Niru Maheswaranathan
David Sussillo
Luke Metz
Ruoxi Sun
Jascha Narain Sohl-Dickstein
24
21
0
04 Nov 2020
GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training
Tianle Cai
Shengjie Luo
Keyulu Xu
Di He
Tie-Yan Liu
Liwei Wang
GNN
32
160
0
07 Sep 2020
Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge
Chaoyang He
M. Annavaram
A. Avestimehr
FedML
32
23
0
28 Jul 2020
On the training dynamics of deep networks with
L
2
L_2
L
2
regularization
Aitor Lewkowycz
Guy Gur-Ari
46
53
0
15 Jun 2020
Understanding the Role of Training Regimes in Continual Learning
Seyed Iman Mirzadeh
Mehrdad Farajtabar
Razvan Pascanu
H. Ghasemzadeh
CLL
21
219
0
12 Jun 2020
Few-shot Neural Architecture Search
Yiyang Zhao
Linnan Wang
Yuandong Tian
Rodrigo Fonseca
Tian Guo
30
90
0
11 Jun 2020
Angle-based Search Space Shrinking for Neural Architecture Search
Yiming Hu
Yuding Liang
Zichao Guo
Ruosi Wan
Xinming Zhang
Yichen Wei
Qingyi Gu
Jian Sun
24
62
0
28 Apr 2020
On Learning Rates and Schrödinger Operators
Bin Shi
Weijie J. Su
Michael I. Jordan
34
60
0
15 Apr 2020
Evolving Normalization-Activation Layers
Hanxiao Liu
Andrew Brock
Karen Simonyan
Quoc V. Le
25
79
0
06 Apr 2020
The Two Regimes of Deep Network Training
Guillaume Leclerc
Aleksander Madry
27
45
0
24 Feb 2020
Big Transfer (BiT): General Visual Representation Learning
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
J. Puigcerver
Jessica Yung
Sylvain Gelly
N. Houlsby
MQ
114
1,183
0
24 Dec 2019
Linear Mode Connectivity and the Lottery Ticket Hypothesis
Jonathan Frankle
Gintare Karolina Dziugaite
Daniel M. Roy
Michael Carbin
MoMe
43
601
0
11 Dec 2019
1