To Each Optimizer a Norm, To Each Norm its Generalization

11 June 2020

Nicolas Le Roux

Papers citing "To Each Optimizer a Norm, To Each Norm its Generalization"

26 / 26 papers shown

Title
Implicit Regularization in Deep Learning May Not Be Explainable by Norms Noam Razin Nadav Cohen 34 155 0 13 May 2020
Finite-sample Analysis of Interpolating Linear Classifiers in the Overparameterized Regime Niladri S. Chatterji Philip M. Long 18 109 0 25 Apr 2020
BackPACK: Packing more into backprop Felix Dangel Frederik Kunstner Philipp Hennig ODL 28 103 0 23 Dec 2019
Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks Sanjeev Arora S. Du Zhiyuan Li Ruslan Salakhutdinov Ruosong Wang Dingli Yu AAML 29 162 0 03 Oct 2019
Bias of Homotopic Gradient Descent for the Hinge Loss Denali Molitor Deanna Needell Rachel A. Ward 22 5 0 26 Jul 2019
Benign Overfitting in Linear Regression Peter L. Bartlett Philip M. Long Gábor Lugosi Alexander Tsigler MLT 34 769 0 26 Jun 2019
The Implicit Bias of AdaGrad on Separable Data Qian Qian Xiaoyuan Qian 37 23 0 09 Jun 2019
Implicit Regularization in Deep Matrix Factorization Sanjeev Arora Nadav Cohen Wei Hu Yuping Luo AI4CE 52 500 0 31 May 2019
The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study Daniel S. Park Jascha Narain Sohl-Dickstein Quoc V. Le Samuel L. Smith 48 57 0 09 May 2019
Harmless interpolation of noisy data in regression Vidya Muthukumar Kailas Vodrahalli Vignesh Subramanian A. Sahai 38 204 0 21 Mar 2019
Surprises in High-Dimensional Ridgeless Least Squares Interpolation Trevor Hastie Andrea Montanari Saharon Rosset Robert Tibshirani 73 737 0 19 Mar 2019
Reconciling modern machine learning practice and the bias-variance trade-off M. Belkin Daniel J. Hsu Siyuan Ma Soumik Mandal 137 1,628 0 28 Dec 2018
Gradient descent aligns the layers of deep linear networks Ziwei Ji Matus Telgarsky 81 250 0 04 Oct 2018
Neural Tangent Kernel: Convergence and Generalization in Neural Networks Arthur Jacot Franck Gabriel Clément Hongler 117 3,160 0 20 Jun 2018
Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate Mor Shpigel Nacson Nathan Srebro Daniel Soudry FedML MLT 41 100 0 05 Jun 2018
Implicit Bias of Gradient Descent on Linear Convolutional Networks Suriya Gunasekar Jason D. Lee Daniel Soudry Nathan Srebro MDE 34 408 0 01 Jun 2018
Convergence of Gradient Descent on Separable Data Mor Shpigel Nacson Jason D. Lee Suriya Gunasekar Pedro H. P. Savarese Nathan Srebro Daniel Soudry 42 167 0 05 Mar 2018
Characterizing Implicit Bias in Terms of Optimization Geometry Suriya Gunasekar Jason D. Lee Daniel Soudry Nathan Srebro AI4CE 55 404 0 22 Feb 2018
To understand deep learning we need to understand kernel learning M. Belkin Siyuan Ma Soumik Mandal 25 414 0 05 Feb 2018
Improving Generalization Performance by Switching from Adam to SGD N. Keskar R. Socher ODL 57 522 0 20 Dec 2017
The Implicit Bias of Gradient Descent on Separable Data Daniel Soudry Elad Hoffer Mor Shpigel Nacson Suriya Gunasekar Nathan Srebro 51 908 0 27 Oct 2017
Implicit Regularization in Matrix Factorization Suriya Gunasekar Blake E. Woodworth Srinadh Bhojanapalli Behnam Neyshabur Nathan Srebro 50 490 0 25 May 2017
The Marginal Value of Adaptive Gradient Methods in Machine Learning Ashia Wilson Rebecca Roelofs Mitchell Stern Nathan Srebro Benjamin Recht ODL 39 1,023 0 23 May 2017
Understanding deep learning requires rethinking generalization Chiyuan Zhang Samy Bengio Moritz Hardt Benjamin Recht Oriol Vinyals HAI 203 4,612 0 10 Nov 2016
Adam: A Method for Stochastic Optimization Diederik P. Kingma Jimmy Ba ODL 262 149,474 0 22 Dec 2014
Sublinear Optimization for Machine Learning K. Clarkson Elad Hazan David P. Woodruff 45 138 0 21 Oct 2010