Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1905.13277
Cited By
Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence
30 May 2019
Aditya Golatkar
Alessandro Achille
Stefano Soatto
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence"
25 / 25 papers shown
Title
Effective Regularization Through Loss-Function Metalearning
Santiago Gonzalez
Xin Qiu
Risto Miikkulainen
105
0
0
02 Oct 2020
Extrapolation for Large-batch Training in Deep Learning
Tao R. Lin
Lingjing Kong
Sebastian U. Stich
Martin Jaggi
74
36
0
10 Jun 2020
Three Mechanisms of Weight Decay Regularization
Guodong Zhang
Chaoqi Wang
Bowen Xu
Roger C. Grosse
62
258
0
29 Oct 2018
Norm matters: efficient and accurate normalization schemes in deep networks
Elad Hoffer
Ron Banner
Itay Golan
Daniel Soudry
OffRL
64
179
0
05 Mar 2018
Visualizing the Loss Landscape of Neural Nets
Hao Li
Zheng Xu
Gavin Taylor
Christoph Studer
Tom Goldstein
243
1,893
0
28 Dec 2017
Three Factors Influencing Minima in SGD
Stanislaw Jastrzebski
Zachary Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
76
463
0
13 Nov 2017
mixup: Beyond Empirical Risk Minimization
Hongyi Zhang
Moustapha Cissé
Yann N. Dauphin
David Lopez-Paz
NoLa
280
9,764
0
25 Oct 2017
A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
Behnam Neyshabur
Srinadh Bhojanapalli
Nathan Srebro
83
607
0
29 Jul 2017
L2 Regularization versus Batch and Weight Normalization
Twan van Laarhoven
65
301
0
16 Jun 2017
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
Elad Hoffer
Itay Hubara
Daniel Soudry
ODL
176
799
0
24 May 2017
The Marginal Value of Adaptive Gradient Methods in Machine Learning
Ashia Wilson
Rebecca Roelofs
Mitchell Stern
Nathan Srebro
Benjamin Recht
ODL
65
1,032
0
23 May 2017
Adaptive Regularization of Some Inverse Problems in Image Analysis
Byung-Woo Hong
JaKeoung Koo
Martin Burger
Stefano Soatto
46
9
0
09 May 2017
Sharp Minima Can Generalize For Deep Nets
Laurent Dinh
Razvan Pascanu
Samy Bengio
Yoshua Bengio
ODL
116
772
0
15 Mar 2017
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
Pratik Chaudhari
A. Choromańska
Stefano Soatto
Yann LeCun
Carlo Baldassi
C. Borgs
J. Chayes
Levent Sagun
R. Zecchina
ODL
96
773
0
06 Nov 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
424
2,941
0
15 Sep 2016
SGDR: Stochastic Gradient Descent with Warm Restarts
I. Loshchilov
Frank Hutter
ODL
333
8,130
0
13 Aug 2016
Recurrent Orthogonal Networks and Long-Memory Tasks
Mikael Henaff
Arthur Szlam
Yann LeCun
64
133
0
22 Feb 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
194,020
0
10 Dec 2015
Adding Gradient Noise Improves Learning for Very Deep Networks
Arvind Neelakantan
Luke Vilnis
Quoc V. Le
Ilya Sutskever
Lukasz Kaiser
Karol Kurach
James Martens
AI4CE
ODL
83
545
0
21 Nov 2015
Cyclical Learning Rates for Training Neural Networks
L. Smith
ODL
208
2,529
0
03 Jun 2015
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe
Christian Szegedy
OOD
463
43,305
0
11 Feb 2015
Striving for Simplicity: The All Convolutional Net
Jost Tobias Springenberg
Alexey Dosovitskiy
Thomas Brox
Martin Riedmiller
FAtt
248
4,672
0
21 Dec 2014
New insights and perspectives on the natural gradient method
James Martens
ODL
73
624
0
03 Dec 2014
The Loss Surfaces of Multilayer Networks
A. Choromańska
Mikael Henaff
Michaël Mathieu
Gerard Ben Arous
Yann LeCun
ODL
261
1,198
0
30 Nov 2014
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
Yann N. Dauphin
Razvan Pascanu
Çağlar Gülçehre
Kyunghyun Cho
Surya Ganguli
Yoshua Bengio
ODL
126
1,385
0
10 Jun 2014
1