ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.04415
  4. Cited By
Why Do We Need Weight Decay in Modern Deep Learning?

Why Do We Need Weight Decay in Modern Deep Learning?

6 October 2023
Maksym Andriushchenko
Francesco DÁngelo
Aditya Varre
Nicolas Flammarion
ArXivPDFHTML

Papers citing "Why Do We Need Weight Decay in Modern Deep Learning?"

15 / 15 papers shown
Title
Grokking at the Edge of Numerical Stability
Grokking at the Edge of Numerical Stability
Lucas Prieto
Melih Barsbey
Pedro A.M. Mediano
Tolga Birdal
97
3
0
08 Jan 2025
How Much Can We Forget about Data Contamination?
How Much Can We Forget about Data Contamination?
Sebastian Bordt
Suraj Srinivas
Valentyn Boreiko
U. V. Luxburg
73
2
0
04 Oct 2024
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
253
2,348
0
09 Nov 2022
Adaptive Gradient Methods at the Edge of Stability
Adaptive Gradient Methods at the Edge of Stability
Jeremy M. Cohen
Behrooz Ghorbani
Shankar Krishnan
Naman Agarwal
Sourabh Medapati
...
Daniel Suo
David E. Cardoze
Zachary Nado
George E. Dahl
Justin Gilmer
ODL
57
51
0
29 Jul 2022
Training Compute-Optimal Large Language Models
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
98
1,894
0
29 Mar 2022
How to decay your learning rate
How to decay your learning rate
Aitor Lewkowycz
71
24
0
23 Mar 2021
Reconciling Modern Deep Learning with Traditional Optimization Analyses:
  The Intrinsic Learning Rate
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate
Zhiyuan Li
Kaifeng Lyu
Sanjeev Arora
61
74
0
06 Oct 2020
Fantastic Generalization Measures and Where to Find Them
Fantastic Generalization Measures and Where to Find Them
Yiding Jiang
Behnam Neyshabur
H. Mobahi
Dilip Krishnan
Samy Bengio
AI4CE
46
599
0
04 Dec 2019
An Exponential Learning Rate Schedule for Deep Learning
An Exponential Learning Rate Schedule for Deep Learning
Zhiyuan Li
Sanjeev Arora
31
214
0
16 Oct 2019
A Study of BFLOAT16 for Deep Learning Training
A Study of BFLOAT16 for Deep Learning Training
Dhiraj D. Kalamkar
Dheevatsa Mudigere
Naveen Mellempudi
Dipankar Das
K. Banerjee
...
Sudarshan Srinivasan
Abhisek Kundu
M. Smelyanskiy
Bharat Kaul
Pradeep Dubey
MQ
48
340
0
29 May 2019
Three Factors Influencing Minima in SGD
Three Factors Influencing Minima in SGD
Stanislaw Jastrzebski
Zachary Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
55
458
0
13 Nov 2017
L2 Regularization versus Batch and Weight Normalization
L2 Regularization versus Batch and Weight Normalization
Twan van Laarhoven
24
294
0
16 Jun 2017
Dissecting Adam: The Sign, Magnitude and Variance of Stochastic
  Gradients
Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients
Lukas Balles
Philipp Hennig
60
163
0
22 May 2017
Snapshot Ensembles: Train 1, get M for free
Snapshot Ensembles: Train 1, get M for free
Gao Huang
Yixuan Li
Geoff Pleiss
Zhuang Liu
John E. Hopcroft
Kilian Q. Weinberger
OOD
FedML
UQCV
98
938
0
01 Apr 2017
Delving Deep into Rectifiers: Surpassing Human-Level Performance on
  ImageNet Classification
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
VLM
95
18,534
0
06 Feb 2015
1