Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.06821
Cited By
To Each Optimizer a Norm, To Each Norm its Generalization
11 June 2020
Sharan Vaswani
Reza Babanezhad
Jose Gallego
Aaron Mishkin
Simon Lacoste-Julien
Nicolas Le Roux
Re-assign community
ArXiv
PDF
HTML
Papers citing
"To Each Optimizer a Norm, To Each Norm its Generalization"
26 / 26 papers shown
Title
Implicit Regularization in Deep Learning May Not Be Explainable by Norms
Noam Razin
Nadav Cohen
34
155
0
13 May 2020
Finite-sample Analysis of Interpolating Linear Classifiers in the Overparameterized Regime
Niladri S. Chatterji
Philip M. Long
18
109
0
25 Apr 2020
BackPACK: Packing more into backprop
Felix Dangel
Frederik Kunstner
Philipp Hennig
ODL
28
103
0
23 Dec 2019
Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks
Sanjeev Arora
S. Du
Zhiyuan Li
Ruslan Salakhutdinov
Ruosong Wang
Dingli Yu
AAML
29
162
0
03 Oct 2019
Bias of Homotopic Gradient Descent for the Hinge Loss
Denali Molitor
Deanna Needell
Rachel A. Ward
22
5
0
26 Jul 2019
Benign Overfitting in Linear Regression
Peter L. Bartlett
Philip M. Long
Gábor Lugosi
Alexander Tsigler
MLT
34
769
0
26 Jun 2019
The Implicit Bias of AdaGrad on Separable Data
Qian Qian
Xiaoyuan Qian
37
23
0
09 Jun 2019
Implicit Regularization in Deep Matrix Factorization
Sanjeev Arora
Nadav Cohen
Wei Hu
Yuping Luo
AI4CE
52
500
0
31 May 2019
The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study
Daniel S. Park
Jascha Narain Sohl-Dickstein
Quoc V. Le
Samuel L. Smith
48
57
0
09 May 2019
Harmless interpolation of noisy data in regression
Vidya Muthukumar
Kailas Vodrahalli
Vignesh Subramanian
A. Sahai
38
204
0
21 Mar 2019
Surprises in High-Dimensional Ridgeless Least Squares Interpolation
Trevor Hastie
Andrea Montanari
Saharon Rosset
Robert Tibshirani
73
737
0
19 Mar 2019
Reconciling modern machine learning practice and the bias-variance trade-off
M. Belkin
Daniel J. Hsu
Siyuan Ma
Soumik Mandal
137
1,628
0
28 Dec 2018
Gradient descent aligns the layers of deep linear networks
Ziwei Ji
Matus Telgarsky
81
250
0
04 Oct 2018
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Arthur Jacot
Franck Gabriel
Clément Hongler
117
3,160
0
20 Jun 2018
Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate
Mor Shpigel Nacson
Nathan Srebro
Daniel Soudry
FedML
MLT
41
100
0
05 Jun 2018
Implicit Bias of Gradient Descent on Linear Convolutional Networks
Suriya Gunasekar
Jason D. Lee
Daniel Soudry
Nathan Srebro
MDE
34
408
0
01 Jun 2018
Convergence of Gradient Descent on Separable Data
Mor Shpigel Nacson
Jason D. Lee
Suriya Gunasekar
Pedro H. P. Savarese
Nathan Srebro
Daniel Soudry
42
167
0
05 Mar 2018
Characterizing Implicit Bias in Terms of Optimization Geometry
Suriya Gunasekar
Jason D. Lee
Daniel Soudry
Nathan Srebro
AI4CE
55
404
0
22 Feb 2018
To understand deep learning we need to understand kernel learning
M. Belkin
Siyuan Ma
Soumik Mandal
25
414
0
05 Feb 2018
Improving Generalization Performance by Switching from Adam to SGD
N. Keskar
R. Socher
ODL
57
522
0
20 Dec 2017
The Implicit Bias of Gradient Descent on Separable Data
Daniel Soudry
Elad Hoffer
Mor Shpigel Nacson
Suriya Gunasekar
Nathan Srebro
51
908
0
27 Oct 2017
Implicit Regularization in Matrix Factorization
Suriya Gunasekar
Blake E. Woodworth
Srinadh Bhojanapalli
Behnam Neyshabur
Nathan Srebro
50
490
0
25 May 2017
The Marginal Value of Adaptive Gradient Methods in Machine Learning
Ashia Wilson
Rebecca Roelofs
Mitchell Stern
Nathan Srebro
Benjamin Recht
ODL
39
1,023
0
23 May 2017
Understanding deep learning requires rethinking generalization
Chiyuan Zhang
Samy Bengio
Moritz Hardt
Benjamin Recht
Oriol Vinyals
HAI
203
4,612
0
10 Nov 2016
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
262
149,474
0
22 Dec 2014
Sublinear Optimization for Machine Learning
K. Clarkson
Elad Hazan
David P. Woodruff
45
138
0
21 Oct 2010
1