Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2012.13841
Cited By
Understanding Decoupled and Early Weight Decay
27 December 2020
Johan Bjorck
Kilian Q. Weinberger
Carla P. Gomes
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Understanding Decoupled and Early Weight Decay"
6 / 6 papers shown
Title
Scaling Optimal LR Across Token Horizons
Johan Bjorck
Alon Benhaim
Vishrav Chaudhary
Furu Wei
Xia Song
80
5
0
30 Sep 2024
Adversarial Examples Improve Image Recognition
Cihang Xie
Mingxing Tan
Boqing Gong
Jiang Wang
Alan Yuille
Quoc V. Le
AAML
68
564
0
21 Nov 2019
Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence
Aditya Golatkar
Alessandro Achille
Stefano Soatto
58
95
0
30 May 2019
L2 Regularization versus Batch and Weight Normalization
Twan van Laarhoven
33
294
0
16 Jun 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
355
2,922
0
15 Sep 2016
Asynchronous Methods for Deep Reinforcement Learning
Volodymyr Mnih
Adria Puigdomenech Badia
M. Berk Mirza
Alex Graves
Timothy Lillicrap
Tim Harley
David Silver
Koray Kavukcuoglu
161
8,805
0
04 Feb 2016
1