Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.17212
Cited By
v1
v2
v3
v4 (latest)
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
26 May 2023
Atli Kosson
Bettina Messmer
Martin Jaggi
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks"
24 / 24 papers shown
Title
Grokking at the Edge of Numerical Stability
Lucas Prieto
Melih Barsbey
Pedro A.M. Mediano
Tolga Birdal
133
5
0
08 Jan 2025
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang
Depen Morwani
Nikhil Vyas
Jingfeng Wu
Difan Zou
Udaya Ghai
Dean Phillips Foster
Sham Kakade
142
18
0
29 Oct 2024
Symbolic Discovery of Optimization Algorithms
Xiangning Chen
Chen Liang
Da Huang
Esteban Real
Kaiyuan Wang
...
Xuanyi Dong
Thang Luong
Cho-Jui Hsieh
Yifeng Lu
Quoc V. Le
161
377
0
13 Feb 2023
Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes
M. Kodryan
E. Lobacheva
M. Nakhodnov
Dmitry Vetrov
82
17
0
08 Sep 2022
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
208
1,987
0
29 Mar 2022
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
Zhiyuan Li
Sadhika Malladi
Sanjeev Arora
99
80
0
24 Feb 2021
Learning by Turning: Neural Architecture Aware Optimisation
Yang Liu
Jeremy Bernstein
M. Meister
Yisong Yue
ODL
123
26
0
14 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
285
523
0
11 Feb 2021
On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them: A Gradient-Norm Perspective
Zeke Xie
Zhiqiang Xu
Jingzhao Zhang
Issei Sato
Masashi Sugiyama
64
25
0
23 Nov 2020
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate
Zhiyuan Li
Kaifeng Lyu
Sanjeev Arora
102
75
0
06 Oct 2020
An Exponential Learning Rate Schedule for Deep Learning
Zhiyuan Li
Sanjeev Arora
54
219
0
16 Oct 2019
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
Myle Ott
Sergey Edunov
Alexei Baevski
Angela Fan
Sam Gross
Nathan Ng
David Grangier
Michael Auli
VLM
FaML
117
3,156
0
01 Apr 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
269
998
0
01 Apr 2019
An Empirical Model of Large-Batch Training
Sam McCandlish
Jared Kaplan
Dario Amodei
OpenAI Dota Team
69
280
0
14 Dec 2018
Measuring the Effects of Data Parallelism on Neural Network Training
Christopher J. Shallue
Jaehoon Lee
J. Antognini
J. Mamou
J. Ketterling
Yao Wang
95
409
0
08 Nov 2018
Group Normalization
Yuxin Wu
Kaiming He
239
3,672
0
22 Mar 2018
Projection Based Weight Normalization for Deep Neural Networks
Lei Huang
Xianglong Liu
B. Lang
Yue Liu
51
18
0
06 Oct 2017
Large Batch Training of Convolutional Networks
Yang You
Igor Gitman
Boris Ginsburg
ODL
141
852
0
13 Aug 2017
L2 Regularization versus Batch and Weight Normalization
Twan van Laarhoven
89
302
0
16 Jun 2017
Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization
Xun Huang
Serge J. Belongie
OOD
181
4,372
0
20 Mar 2017
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
343
2,900
0
26 Sep 2016
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
435
10,541
0
21 Jul 2016
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe
Christian Szegedy
OOD
467
43,347
0
11 Feb 2015
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
1.7K
39,615
0
01 Sep 2014
1