ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.13277
  4. Cited By
Time Matters in Regularizing Deep Networks: Weight Decay and Data
  Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence

Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence

30 May 2019
Aditya Golatkar
Alessandro Achille
Stefano Soatto
ArXivPDFHTML

Papers citing "Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence"

25 / 25 papers shown
Title
Effective Regularization Through Loss-Function Metalearning
Effective Regularization Through Loss-Function Metalearning
Santiago Gonzalez
Xin Qiu
Risto Miikkulainen
105
5
0
02 Oct 2020
Extrapolation for Large-batch Training in Deep Learning
Extrapolation for Large-batch Training in Deep Learning
Tao R. Lin
Lingjing Kong
Sebastian U. Stich
Martin Jaggi
74
36
0
10 Jun 2020
Three Mechanisms of Weight Decay Regularization
Three Mechanisms of Weight Decay Regularization
Guodong Zhang
Chaoqi Wang
Bowen Xu
Roger C. Grosse
62
258
0
29 Oct 2018
Norm matters: efficient and accurate normalization schemes in deep
  networks
Norm matters: efficient and accurate normalization schemes in deep networks
Elad Hoffer
Ron Banner
Itay Golan
Daniel Soudry
OffRL
64
179
0
05 Mar 2018
Visualizing the Loss Landscape of Neural Nets
Visualizing the Loss Landscape of Neural Nets
Hao Li
Zheng Xu
Gavin Taylor
Christoph Studer
Tom Goldstein
249
1,893
0
28 Dec 2017
Three Factors Influencing Minima in SGD
Three Factors Influencing Minima in SGD
Stanislaw Jastrzebski
Zachary Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
76
463
0
13 Nov 2017
mixup: Beyond Empirical Risk Minimization
mixup: Beyond Empirical Risk Minimization
Hongyi Zhang
Moustapha Cissé
Yann N. Dauphin
David Lopez-Paz
NoLa
280
9,764
0
25 Oct 2017
A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for
  Neural Networks
A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
Behnam Neyshabur
Srinadh Bhojanapalli
Nathan Srebro
83
607
0
29 Jul 2017
L2 Regularization versus Batch and Weight Normalization
L2 Regularization versus Batch and Weight Normalization
Twan van Laarhoven
65
301
0
16 Jun 2017
Train longer, generalize better: closing the generalization gap in large
  batch training of neural networks
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
Elad Hoffer
Itay Hubara
Daniel Soudry
ODL
176
799
0
24 May 2017
The Marginal Value of Adaptive Gradient Methods in Machine Learning
The Marginal Value of Adaptive Gradient Methods in Machine Learning
Ashia Wilson
Rebecca Roelofs
Mitchell Stern
Nathan Srebro
Benjamin Recht
ODL
68
1,032
0
23 May 2017
Adaptive Regularization of Some Inverse Problems in Image Analysis
Adaptive Regularization of Some Inverse Problems in Image Analysis
Byung-Woo Hong
JaKeoung Koo
Martin Burger
Stefano Soatto
46
9
0
09 May 2017
Sharp Minima Can Generalize For Deep Nets
Sharp Minima Can Generalize For Deep Nets
Laurent Dinh
Razvan Pascanu
Samy Bengio
Yoshua Bengio
ODL
116
772
0
15 Mar 2017
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
Pratik Chaudhari
A. Choromańska
Stefano Soatto
Yann LeCun
Carlo Baldassi
C. Borgs
J. Chayes
Levent Sagun
R. Zecchina
ODL
96
773
0
06 Nov 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
424
2,941
0
15 Sep 2016
SGDR: Stochastic Gradient Descent with Warm Restarts
SGDR: Stochastic Gradient Descent with Warm Restarts
I. Loshchilov
Frank Hutter
ODL
333
8,130
0
13 Aug 2016
Recurrent Orthogonal Networks and Long-Memory Tasks
Recurrent Orthogonal Networks and Long-Memory Tasks
Mikael Henaff
Arthur Szlam
Yann LeCun
67
133
0
22 Feb 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
194,020
0
10 Dec 2015
Adding Gradient Noise Improves Learning for Very Deep Networks
Adding Gradient Noise Improves Learning for Very Deep Networks
Arvind Neelakantan
Luke Vilnis
Quoc V. Le
Ilya Sutskever
Lukasz Kaiser
Karol Kurach
James Martens
AI4CE
ODL
83
545
0
21 Nov 2015
Cyclical Learning Rates for Training Neural Networks
Cyclical Learning Rates for Training Neural Networks
L. Smith
ODL
210
2,529
0
03 Jun 2015
Batch Normalization: Accelerating Deep Network Training by Reducing
  Internal Covariate Shift
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe
Christian Szegedy
OOD
463
43,305
0
11 Feb 2015
Striving for Simplicity: The All Convolutional Net
Striving for Simplicity: The All Convolutional Net
Jost Tobias Springenberg
Alexey Dosovitskiy
Thomas Brox
Martin Riedmiller
FAtt
248
4,672
0
21 Dec 2014
New insights and perspectives on the natural gradient method
New insights and perspectives on the natural gradient method
James Martens
ODL
73
624
0
03 Dec 2014
The Loss Surfaces of Multilayer Networks
The Loss Surfaces of Multilayer Networks
A. Choromańska
Mikael Henaff
Michaël Mathieu
Gerard Ben Arous
Yann LeCun
ODL
261
1,198
0
30 Nov 2014
Identifying and attacking the saddle point problem in high-dimensional
  non-convex optimization
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
Yann N. Dauphin
Razvan Pascanu
Çağlar Gülçehre
Kyunghyun Cho
Surya Ganguli
Yoshua Bengio
ODL
126
1,385
0
10 Jun 2014
1