Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2008.06736
Cited By
Obtaining Adjustable Regularization for Free via Iterate Averaging
15 August 2020
Jingfeng Wu
Vladimir Braverman
Lin F. Yang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Obtaining Adjustable Regularization for Free via Iterate Averaging"
24 / 24 papers shown
Title
Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks
Wei Hu
Lechao Xiao
Jeffrey Pennington
47
113
0
16 Jan 2020
Lookahead Optimizer: k steps forward, 1 step back
Michael Ruogu Zhang
James Lucas
Geoffrey E. Hinton
Jimmy Ba
ODL
91
725
0
19 Jul 2019
Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks
Yuan Cao
Quanquan Gu
MLT
AI4CE
64
384
0
30 May 2019
On Exact Computation with an Infinitely Wide Neural Net
Sanjeev Arora
S. Du
Wei Hu
Zhiyuan Li
Ruslan Salakhutdinov
Ruosong Wang
140
915
0
26 Apr 2019
Acceleration via Symplectic Discretization of High-Resolution Differential Equations
Bin Shi
S. Du
Weijie J. Su
Michael I. Jordan
28
121
0
11 Feb 2019
Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel
Colin Wei
Jason D. Lee
Qiang Liu
Tengyu Ma
114
245
0
12 Oct 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
951
93,936
0
11 Oct 2018
A Quantitative Analysis of the Effect of Batch Normalization on Gradient Descent
Yongqiang Cai
Qianxiao Li
Zuowei Shen
26
3
0
29 Sep 2018
Averaging Weights Leads to Wider Optima and Better Generalization
Pavel Izmailov
Dmitrii Podoprikhin
T. Garipov
Dmitry Vetrov
A. Wilson
FedML
MoMe
93
1,643
0
14 Mar 2018
The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects
Zhanxing Zhu
Jingfeng Wu
Ting Yu
Lei Wu
Jin Ma
28
40
0
01 Mar 2018
Characterizing Implicit Bias in Terms of Optimization Geometry
Suriya Gunasekar
Jason D. Lee
Daniel Soudry
Nathan Srebro
AI4CE
60
404
0
22 Feb 2018
Iterate averaging as regularization for stochastic gradient descent
Gergely Neu
Lorenzo Rosasco
MoMe
66
61
0
22 Feb 2018
The Implicit Bias of Gradient Descent on Separable Data
Daniel Soudry
Elad Hoffer
Mor Shpigel Nacson
Suriya Gunasekar
Nathan Srebro
66
908
0
27 Oct 2017
The Marginal Value of Adaptive Gradient Methods in Machine Learning
Ashia Wilson
Rebecca Roelofs
Mitchell Stern
Nathan Srebro
Benjamin Recht
ODL
48
1,023
0
23 May 2017
The Physical Systems Behind Optimization Algorithms
Lin F. Yang
R. Arora
Vladimir Braverman
T. Zhao
AI4CE
30
19
0
08 Dec 2016
Understanding deep learning requires rethinking generalization
Chiyuan Zhang
Samy Bengio
Moritz Hardt
Benjamin Recht
Oriol Vinyals
HAI
264
4,620
0
10 Nov 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
1.4K
192,638
0
10 Dec 2015
Stochastic modified equations and adaptive stochastic gradient algorithms
Qianxiao Li
Cheng Tai
E. Weinan
54
282
0
19 Nov 2015
A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights
Weijie Su
Stephen P. Boyd
Emmanuel J. Candes
149
1,161
0
04 Mar 2015
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe
Christian Szegedy
OOD
300
43,154
0
11 Feb 2015
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
VLM
171
18,534
0
06 Feb 2015
New insights and perspectives on the natural gradient method
James Martens
ODL
54
613
0
03 Dec 2014
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan
Andrew Zisserman
FAtt
MDE
928
99,991
0
04 Sep 2014
Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n)
Francis R. Bach
Eric Moulines
58
404
0
10 Jun 2013
1