v1v2 (latest)

Train faster, generalize better: Stability of stochastic gradient descent

3 September 2015

Moritz Hardt

Benjamin Recht

Y. Singer

ArXiv (abs)PDF HTML

Papers citing "Train faster, generalize better: Stability of stochastic gradient descent"

50 / 679 papers shown

Title
Learning Compact Neural Networks with Regularization Samet Oymak MLT 101 39 0 05 Feb 2018
Generalization Error Bounds for Noisy, Iterative Algorithms Ankit Pensia Varun Jog Po-Ling Loh 98 114 0 12 Jan 2018
Theory of Deep Learning III: explaining the non-overfitting puzzle T. Poggio Kenji Kawaguchi Q. Liao Alycia Lee Lorenzo Rosasco Xavier Boix Jack Hidary H. Mhaskar ODL 104 128 0 30 Dec 2017
Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations Yuanzhi Li Tengyu Ma Hongyang R. Zhang 74 31 0 26 Dec 2017
DropMax: Adaptive Variational Softmax Haebeom Lee Juho Lee Saehoon Kim Eunho Yang Sung Ju Hwang 56 13 0 21 Dec 2017
Improving Generalization Performance by Switching from Adam to SGD N. Keskar R. Socher ODL 107 524 0 20 Dec 2017
Statistical Inference for the Population Landscape via Moment Adjusted Stochastic Gradients Tengyuan Liang Weijie Su 72 21 0 20 Dec 2017
Size-Independent Sample Complexity of Neural Networks Noah Golowich Alexander Rakhlin Ohad Shamir 185 551 0 18 Dec 2017
Mathematics of Deep Learning René Vidal Joan Bruna Raja Giryes Stefano Soatto OOD 70 120 0 13 Dec 2017
Online Learning via the Differential Privacy Lens Jacob D. Abernethy Young Hun Jung Chansoo Lee Audra McMillan Ambuj Tewari 42 13 0 27 Nov 2017
Regularization for Deep Learning: A Taxonomy J. Kukačka Vladimir Golkov Zorah Lähner 101 336 0 29 Oct 2017
The Implicit Bias of Gradient Descent on Separable Data Daniel Soudry Elad Hoffer Mor Shpigel Nacson Suriya Gunasekar Nathan Srebro 263 926 0 27 Oct 2017
Stability and Generalization of Learning Algorithms that Converge to Global Optima Zachary B. Charles Dimitris Papailiopoulos MLT 75 163 0 23 Oct 2017
Function Norms and Regularization in Deep Networks Amal Rannen Triki Maxim Berman Matthew B. Blaschko 78 2 0 18 Oct 2017
Generalization in Deep Learning Kenji Kawaguchi L. Kaelbling Yoshua Bengio ODL 216 460 0 16 Oct 2017
A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent Ben London 100 79 0 19 Sep 2017
The Impact of Local Geometry and Batch Size on Stochastic Gradient Descent for Nonconvex Problems V. Patel MLT 73 8 0 14 Sep 2017
Stochastic Gradient Descent: Going As Fast As Possible But Not Faster Alice Schoenauer Sebag Marc Schoenauer Michèle Sebag 45 11 0 05 Sep 2017
Convergence of Unregularized Online Learning Algorithms Yunwen Lei Lei Shi Zheng-Chu Guo 89 14 0 09 Aug 2017
Regularizing and Optimizing LSTM Language Models Stephen Merity N. Keskar R. Socher 183 1,099 0 07 Aug 2017
A Robust Multi-Batch L-BFGS Method for Machine Learning A. Berahas Martin Takáč AAML ODL 113 44 0 26 Jul 2017
Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints Wenlong Mou Liwei Wang Xiyu Zhai Kai Zheng MLT 75 159 0 19 Jul 2017
Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes Lei Wu Zhanxing Zhu E. Weinan ODL 71 220 0 30 Jun 2017
Exploring Generalization in Deep Learning Behnam Neyshabur Srinadh Bhojanapalli David A. McAllester Nathan Srebro FAtt 205 1,261 0 27 Jun 2017
Gradient Diversity: a Key Ingredient for Scalable Distributed Learning Dong Yin A. Pananjady Max Lam Dimitris Papailiopoulos Kannan Ramchandran Peter L. Bartlett 99 11 0 18 Jun 2017
A Closer Look at Memorization in Deep Networks Devansh Arpit Stanislaw Jastrzebski Nicolas Ballas David M. Krueger Emmanuel Bengio ... Tegan Maharaj Asja Fischer Aaron Courville Yoshua Bengio Simon Lacoste-Julien TDI 206 1,834 0 16 Jun 2017
Stochastic Training of Neural Networks via Successive Convex Approximations Simone Scardapane Paolo Di Lorenzo 43 9 0 15 Jun 2017
Recovery Guarantees for One-hidden-layer Neural Networks Kai Zhong Zhao Song Prateek Jain Peter L. Bartlett Inderjit S. Dhillon MLT 209 337 0 10 Jun 2017
Are Saddles Good Enough for Deep Learning? Adepu Ravi Sankar V. Balasubramanian 65 5 0 07 Jun 2017
Deep Learning: Generalization Requires Deep Compositional Feature Space Design Mrinal Haloi MLT OOD 34 3 0 06 Jun 2017
Classification regions of deep neural networks Alhussein Fawzi Seyed-Mohsen Moosavi-Dezfooli P. Frossard Stefano Soatto 86 51 0 26 May 2017
Train longer, generalize better: closing the generalization gap in large batch training of neural networks Elad Hoffer Itay Hubara Daniel Soudry ODL 207 803 0 24 May 2017
The Marginal Value of Adaptive Gradient Methods in Machine Learning Ashia Wilson Rebecca Roelofs Mitchell Stern Nathan Srebro Benjamin Recht ODL 125 1,035 0 23 May 2017
Bandit Structured Prediction for Neural Sequence-to-Sequence Learning Julia Kreutzer Artem Sokolov Stefan Riezler 85 49 0 21 Apr 2017
Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data Gintare Karolina Dziugaite Daniel M. Roy 128 820 0 31 Mar 2017
Efficient Private ERM for Smooth Objectives Jiaqi Zhang Kai Zheng Wenlong Mou Liwei Wang 62 145 0 29 Mar 2017
Sharp Minima Can Generalize For Deep Nets Laurent Dinh Razvan Pascanu Samy Bengio Yoshua Bengio ODL 147 774 0 15 Mar 2017
Data-Dependent Stability of Stochastic Gradient Descent Ilja Kuzborskij Christoph H. Lampert MLT 155 166 0 05 Mar 2017
Algorithmic stability and hypothesis complexity Tongliang Liu Gábor Lugosi Gergely Neu Dacheng Tao 101 92 0 28 Feb 2017
On architectural choices in deep learning: From network structure to gradient convergence and parameter estimation V. Ithapu Sathya Ravi Vikas Singh AI4CE 85 9 0 28 Feb 2017
Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis Maxim Raginsky Alexander Rakhlin Matus Telgarsky 88 521 0 13 Feb 2017
Fast Rates for Empirical Risk Minimization of Strict Saddle Problems Alon Gonen Shai Shalev-Shwartz 124 30 0 16 Jan 2017
Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond Levent Sagun Léon Bottou Yann LeCun UQCV 108 235 0 22 Nov 2016
Understanding deep learning requires rethinking generalization Chiyuan Zhang Samy Bengio Moritz Hardt Benjamin Recht Oriol Vinyals HAI 383 4,641 0 10 Nov 2016
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys Pratik Chaudhari A. Choromańska Stefano Soatto Yann LeCun Carlo Baldassi C. Borgs J. Chayes Levent Sagun R. Zecchina ODL 129 775 0 06 Nov 2016
Deep Information Propagation S. Schoenholz Justin Gilmer Surya Ganguli Jascha Narain Sohl-Dickstein 128 371 0 04 Nov 2016
Globally Optimal Training of Generalized Polynomial Neural Networks with Nonlinear Spectral Methods A. Gautier Quynh N. Nguyen Matthias Hein 142 32 0 28 Oct 2016
Learning Scalable Deep Kernels with Recurrent Structure Maruan Al-Shedivat A. Wilson Yunus Saatchi Zhiting Hu Eric Xing BDL 106 106 0 27 Oct 2016
Membership Inference Attacks against Machine Learning Models Reza Shokri M. Stronati Congzheng Song Vitaly Shmatikov SLR MIALM MIACV 333 4,177 0 18 Oct 2016
Generalization Error Bounds for Optimization Algorithms via Stability Qi Meng Yue Wang Wei-neng Chen Taifeng Wang Zhiming Ma Tie-Yan Liu 38 8 0 27 Sep 2016