Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1810.12281
Cited By
Three Mechanisms of Weight Decay Regularization
29 October 2018
Guodong Zhang
Chaoqi Wang
Bowen Xu
Roger C. Grosse
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Three Mechanisms of Weight Decay Regularization"
50 / 54 papers shown
Title
Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training
Shane Bergsma
Nolan Dey
Gurpreet Gosal
Gavia Gray
Daria Soboleva
Joel Hestness
24
0
0
19 May 2025
Low-Loss Space in Neural Networks is Continuous and Fully Connected
Yongding Tian
Zaid Al-Ars
Maksim Kitsak
P. Hofstee
3DPC
31
1
0
05 May 2025
Adaptive Extrapolated Proximal Gradient Methods with Variance Reduction for Composite Nonconvex Finite-Sum Minimization
Ganzhao Yuan
43
0
0
28 Feb 2025
Towards Accurate Binary Spiking Neural Networks: Learning with Adaptive Gradient Modulation Mechanism
Yu Liang
Wenjie Wei
A. Belatreche
Honglin Cao
Zijian Zhou
Shuai Wang
Malu Zhang
Yue Yang
MQ
68
0
0
21 Feb 2025
How Much Can We Forget about Data Contamination?
Sebastian Bordt
Suraj Srinivas
Valentyn Boreiko
U. V. Luxburg
54
1
0
04 Oct 2024
Classifying Overlapping Gaussian Mixtures in High Dimensions: From Optimal Classifiers to Neural Nets
Khen Cohen
Noam Levi
Yaron Oz
BDL
33
1
0
28 May 2024
How to set AdamW's weight decay as you scale model and dataset size
Xi Wang
Laurence Aitchison
46
10
0
22 May 2024
Implicit Bias of AdamW:
ℓ
∞
\ell_\infty
ℓ
∞
Norm Constrained Optimization
Shuo Xie
Zhiyuan Li
OffRL
50
13
0
05 Apr 2024
Tune without Validation: Searching for Learning Rate and Weight Decay on Training Sets
Lorenzo Brigato
Stavroula Mougiakakou
45
0
0
08 Mar 2024
Analyzing and Improving the Training Dynamics of Diffusion Models
Tero Karras
M. Aittala
J. Lehtinen
Janne Hellsten
Timo Aila
S. Laine
46
158
0
05 Dec 2023
Layer-wise Adaptive Step-Sizes for Stochastic First-Order Methods for Deep Learning
Achraf Bahamou
D. Goldfarb
ODL
36
0
0
23 May 2023
MoMo: Momentum Models for Adaptive Learning Rates
Fabian Schaipp
Ruben Ohana
Michael Eickenberg
Aaron Defazio
Robert Mansel Gower
35
10
0
12 May 2023
On the Ideal Number of Groups for Isometric Gradient Propagation
Bum Jun Kim
Hyeyeon Choi
Hyeonah Jang
Sang Woo Kim
32
1
0
07 Feb 2023
A Stochastic Proximal Polyak Step Size
Fabian Schaipp
Robert Mansel Gower
M. Ulbrich
22
12
0
12 Jan 2023
Toward Equation of Motion for Deep Neural Networks: Continuous-time Gradient Descent and Discretization Error Analysis
Taiki Miyagawa
50
9
0
28 Oct 2022
Noise Injection Node Regularization for Robust Learning
N. Levi
I. Bloch
M. Freytsis
T. Volansky
AI4CE
32
2
0
27 Oct 2022
SGD with Large Step Sizes Learns Sparse Features
Maksym Andriushchenko
Aditya Varre
Loucas Pillaud-Vivien
Nicolas Flammarion
45
56
0
11 Oct 2022
Scale-invariant Bayesian Neural Networks with Connectivity Tangent Kernel
Sungyub Kim
Si-hun Park
Kyungsu Kim
Eunho Yang
BDL
32
4
0
30 Sep 2022
Distributed Semi-supervised Fuzzy Regression with Interpolation Consistency Regularization
Ye-ling Shi
Leijie Zhang
Zehong Cao
M. Tanveer
Chin-Teng Lin
17
7
0
18 Sep 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
47
71
0
14 Jun 2022
Guidelines for the Regularization of Gammas in Batch Normalization for Deep Residual Networks
Bum Jun Kim
Hyeyeon Choi
Hyeonah Jang
Dong Gu Lee
Wonseok Jeong
Sang Woo Kim
21
4
0
15 May 2022
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Sid Black
Stella Biderman
Eric Hallahan
Quentin G. Anthony
Leo Gao
...
Shivanshu Purohit
Laria Reynolds
J. Tow
Benqi Wang
Samuel Weinbach
102
803
0
14 Apr 2022
Semi-Discrete Normalizing Flows through Differentiable Tessellation
Ricky T. Q. Chen
Brandon Amos
Maximilian Nickel
32
10
0
14 Mar 2022
A Data-Augmentation Is Worth A Thousand Samples: Exact Quantification From Analytical Augmented Sample Moments
Randall Balestriero
Ishan Misra
Yann LeCun
35
20
0
16 Feb 2022
Cyclical Focal Loss
L. Smith
35
14
0
16 Feb 2022
A Geometric Understanding of Natural Gradient
Qinxun Bai
S. Rosenberg
Wei Xu
23
2
0
13 Feb 2022
Deep Learning to advance the Eigenspace Perturbation Method for Turbulence Model Uncertainty Quantification
Khashayar Nobarani
S. Razavi
6
0
0
11 Feb 2022
Robust Training of Neural Networks Using Scale Invariant Architectures
Zhiyuan Li
Srinadh Bhojanapalli
Manzil Zaheer
Sashank J. Reddi
Surinder Kumar
29
27
0
02 Feb 2022
Gradient Descent on Neurons and its Link to Approximate Second-Order Optimization
Frederik Benzing
ODL
45
23
0
28 Jan 2022
Target-Oriented Fine-tuning for Zero-Resource Named Entity Recognition
Ying Zhang
Fandong Meng
Jinan Xu
Jinan Xu
Jie Zhou
30
10
0
22 Jul 2021
Initialization and Regularization of Factorized Neural Layers
M. Khodak
Neil A. Tenenholtz
Lester W. Mackey
Nicolò Fusi
65
56
0
03 May 2021
Fundamental Challenges in Deep Learning for Stiff Contact Dynamics
Mihir Parmar
Mathew Halm
Michael Posa
29
36
0
29 Mar 2021
Parareal Neural Networks Emulating a Parallel-in-time Algorithm
Zhanyu Ma
Jiyang Xie
Jingyi Yu
AI4CE
33
9
0
16 Mar 2021
Intraclass clustering: an implicit learning ability that regularizes DNNs
Simon Carbonnelle
Christophe De Vleeschouwer
60
8
0
11 Mar 2021
Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics
D. Kunin
Javier Sagastuy-Breña
Surya Ganguli
Daniel L. K. Yamins
Hidenori Tanaka
107
77
0
08 Dec 2020
A Trace-restricted Kronecker-Factored Approximation to Natural Gradient
Kai-Xin Gao
Xiaolei Liu
Zheng-Hai Huang
Min Wang
Zidong Wang
Dachuan Xu
F. Yu
24
11
0
21 Nov 2020
A Random Matrix Theory Approach to Damping in Deep Learning
Diego Granziol
Nicholas P. Baskerville
AI4CE
ODL
29
2
0
15 Nov 2020
AEGD: Adaptive Gradient Descent with Energy
Hailiang Liu
Xuping Tian
ODL
27
11
0
10 Oct 2020
Group Whitening: Balancing Learning Efficiency and Representational Capacity
Lei Huang
Yi Zhou
Li Liu
Fan Zhu
Ling Shao
33
21
0
28 Sep 2020
Whitening and second order optimization both make information in the dataset unusable during training, and can reduce or prevent generalization
Neha S. Wadia
Daniel Duckworth
S. Schoenholz
Ethan Dyer
Jascha Narain Sohl-Dickstein
29
13
0
17 Aug 2020
Can we Estimate Truck Accident Risk from Telemetric Data using Machine Learning?
Antonio Hebert
Ian Marineau
Gilles Gervais
Tristan Glatard
Brigitte Jaumard
16
2
0
17 Jul 2020
A General Family of Stochastic Proximal Gradient Methods for Deep Learning
Jihun Yun
A. Lozano
Eunho Yang
20
12
0
15 Jul 2020
When Does Preconditioning Help or Hurt Generalization?
S. Amari
Jimmy Ba
Roger C. Grosse
Xuechen Li
Atsushi Nitanda
Taiji Suzuki
Denny Wu
Ji Xu
36
32
0
18 Jun 2020
Understanding and Mitigating Exploding Inverses in Invertible Neural Networks
Jens Behrmann
Paul Vicol
Kuan-Chieh Jackson Wang
Roger C. Grosse
J. Jacobsen
23
93
0
16 Jun 2020
New Interpretations of Normalization Methods in Deep Learning
Jiacheng Sun
Xiangyong Cao
Hanwen Liang
Weiran Huang
Zewei Chen
Zhenguo Li
21
35
0
16 Jun 2020
On the training dynamics of deep networks with
L
2
L_2
L
2
regularization
Aitor Lewkowycz
Guy Gur-Ari
44
53
0
15 Jun 2020
On the Optimal Weighted
ℓ
2
\ell_2
ℓ
2
Regularization in Overparameterized Linear Regression
Denny Wu
Ji Xu
33
121
0
10 Jun 2020
Gradient Centralization: A New Optimization Technique for Deep Neural Networks
Hongwei Yong
Jianqiang Huang
Xiansheng Hua
Lei Zhang
ODL
27
184
0
03 Apr 2020
Iterative Averaging in the Quest for Best Test Error
Diego Granziol
Xingchen Wan
Samuel Albanie
Stephen J. Roberts
10
3
0
02 Mar 2020
Topologically Densified Distributions
Christoph Hofer
Florian Graf
Marc Niethammer
Roland Kwitt
27
15
0
12 Feb 2020
1
2
Next