v1v2 (latest)

Train faster, generalize better: Stability of stochastic gradient descent

3 September 2015

Moritz Hardt

Benjamin Recht

Y. Singer

ArXiv (abs)PDF HTML

Papers citing "Train faster, generalize better: Stability of stochastic gradient descent"

50 / 679 papers shown

Title
Train simultaneously, generalize better: Stability of gradient-based minimax learners Farzan Farnia Asuman Ozdaglar 73 48 0 23 Oct 2020
Feature Selection for Huge Data via Minipatch Learning Tianyi Yao Genevera I. Allen 38 10 0 16 Oct 2020
The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers Preetum Nakkiran Behnam Neyshabur Hanie Sedghi OffRL 104 11 0 16 Oct 2020
Deep generative demixing: Recovering Lipschitz signals from noisy subgaussian mixtures Aaron Berk 43 0 0 13 Oct 2020
Explaining Neural Matrix Factorization with Gradient Rollback Carolin (Haas) Lawrence T. Sztyler Mathias Niepert 102 12 0 12 Oct 2020
How Does Mixup Help With Robustness and Generalization? Linjun Zhang Zhun Deng Kenji Kawaguchi Amirata Ghorbani James Zou AAML 110 252 0 09 Oct 2020
Learning Binary Decision Trees by Argmin Differentiation Valentina Zantedeschi Matt J. Kusner Vlad Niculae 64 13 0 09 Oct 2020
Kernel regression in high dimensions: Refined analysis beyond double descent Fanghui Liu Zhenyu Liao Johan A. K. Suykens 86 50 0 06 Oct 2020
Learning Optimal Representations with the Decodable Information Bottleneck Yann Dubois Douwe Kiela D. Schwab Ramakrishna Vedantam 122 43 0 27 Sep 2020
Faster Biological Gradient Descent Learning H. Li ODL 22 1 0 27 Sep 2020
How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks Keyulu Xu Mozhi Zhang Jingling Li S. Du Ken-ichi Kawarabayashi Stefanie Jegelka MLT 184 313 0 24 Sep 2020
Implicit Gradient Regularization David Barrett Benoit Dherin 104 152 0 23 Sep 2020
Hybrid Stochastic-Deterministic Minibatch Proximal Gradient: Less-Than-Single-Pass Optimization with Nearly Optimal Generalization Pan Zhou Xiaotong Yuan 50 6 0 18 Sep 2020
GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training Tianle Cai Shengjie Luo Keyulu Xu Di He Tie-Yan Liu Liwei Wang GNN 108 167 0 07 Sep 2020
Hybrid Differentially Private Federated Learning on Vertically Partitioned Data Chang Wang Jian Liang Mingkai Huang Bing Bai Kun Bai Hao Li FedML 122 39 0 06 Sep 2020
Making Coherence Out of Nothing At All: Measuring the Evolution of Gradient Alignment S. Chatterjee Piotr Zielinski 54 8 0 03 Aug 2020
Principles and Algorithms for Forecasting Groups of Time Series: Locality and Globality Pablo Montero-Manso Rob J. Hyndman AI4TS 102 139 0 02 Aug 2020
Cross-validation Confidence Intervals for Test Error Pierre Bayle Alexandre Bayle Lucas Janson Lester W. Mackey 73 40 0 24 Jul 2020
Tighter Generalization Bounds for Iterative Differentially Private Learning Algorithms Fengxiang He Bohan Wang Dacheng Tao FedML 55 18 0 18 Jul 2020
Measurement error models: from nonparametric methods to deep neural networks Zhirui Hu Z. Ke Jun S. Liu 31 4 0 15 Jul 2020
Stochastic Hamiltonian Gradient Methods for Smooth Games Nicolas Loizou Hugo Berard Alexia Jolicoeur-Martineau Pascal Vincent Simon Lacoste-Julien Ioannis Mitliagkas 69 50 0 08 Jul 2020
Meta-Learning with Network Pruning Hongduan Tian Bo Liu Xiaotong Yuan Qingshan Liu 61 27 0 07 Jul 2020
AdaSGD: Bridging the gap between SGD and Adam Jiaxuan Wang Jenna Wiens 77 10 0 30 Jun 2020
Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate and Momentum Zeke Xie Xinrui Wang Huishuai Zhang Issei Sato Masashi Sugiyama ODL 155 48 0 29 Jun 2020
Lipschitzness Is All You Need To Tame Off-policy Generative Adversarial Imitation Learning Lionel Blondé Pablo Strasser Alexandros Kalousis 90 22 0 28 Jun 2020
Stability Enhanced Privacy and Applications in Private Stochastic Gradient Descent Lauren Watson Benedek Rozemberczki Rik Sarkar 21 1 0 25 Jun 2020
Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes Shuai Zheng Yanghua Peng Sheng Zha Mu Li ODL 72 21 0 24 Jun 2020
ByGARS: Byzantine SGD with Arbitrary Number of Attackers Jayanth Reddy Regatti Hao Chen Abhishek Gupta FedML AAML 70 4 0 24 Jun 2020
Understanding Deep Architectures with Reasoning Layer Xinshi Chen Yufei Zhang C. Reisinger Le Song AI4CE 127 7 0 24 Jun 2020
Training (Overparametrized) Neural Networks in Near-Linear Time Jan van den Brand Binghui Peng Zhao Song Omri Weinstein ODL 91 83 0 20 Jun 2020
Stochastic Gradient Descent in Hilbert Scales: Smoothness, Preconditioning and Earlier Stopping Nicole Mücke Enrico Reiss 47 7 0 18 Jun 2020
SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation Robert Mansel Gower Othmane Sebbouh Nicolas Loizou 135 76 0 18 Jun 2020
Federated Accelerated Stochastic Gradient Descent Honglin Yuan Tengyu Ma FedML 104 180 0 16 Jun 2020
Shape Matters: Understanding the Implicit Bias of the Noise Covariance Jeff Z. HaoChen Colin Wei Jason D. Lee Tengyu Ma 219 95 0 15 Jun 2020
Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent Yunwen Lei Yiming Ying MLT 99 129 0 15 Jun 2020
Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses Raef Bassily Vitaly Feldman Cristóbal Guzmán Kunal Talwar MLT 83 198 0 12 Jun 2020
Revisiting Explicit Regularization in Neural Networks for Well-Calibrated Predictive Uncertainty Taejong Joo U. Chung BDL UQCV 34 0 0 11 Jun 2020
Speedy Performance Estimation for Neural Architecture Search Binxin Ru Clare Lyle Lisa Schut M. Fil Mark van der Wilk Y. Gal 107 37 0 08 Jun 2020
Bayesian Neural Network via Stochastic Gradient Descent Abhinav Sagar UQCV BDL 62 2 0 04 Jun 2020
Instability, Computational Efficiency and Statistical Accuracy Nhat Ho K. Khamaru Raaz Dwivedi Martin J. Wainwright Michael I. Jordan Bin Yu 72 20 0 22 May 2020
Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping Eduard A. Gorbunov Marina Danilova Alexander Gasnikov 79 123 0 21 May 2020
LALR: Theoretical and Experimental validation of Lipschitz Adaptive Learning Rate in Regression and Neural Networks Snehanshu Saha Tejas Prashanth Suraj Aralihalli Sumedh Basarkod T. Sudarshan S. Dhavala 35 4 0 19 May 2020
Scaling-up Distributed Processing of Data Streams for Machine Learning M. Nokleby Haroon Raja W. Bajwa 69 15 0 18 May 2020
Private Stochastic Convex Optimization: Optimal Rates in Linear Time Vitaly Feldman Tomer Koren Kunal Talwar 85 211 0 10 May 2020
Stochastic batch size for adaptive regularization in deep network optimization Kensuke Nakamura Stefano Soatto Byung-Woo Hong ODL 51 6 0 14 Apr 2020
Detached Error Feedback for Distributed SGD with Random Sparsification An Xu Heng-Chiao Huang 71 9 0 11 Apr 2020
R-FORCE: Robust Learning for Random Recurrent Neural Networks Yang Zheng Eli Shlizerman OOD 41 5 0 25 Mar 2020
A termination criterion for stochastic gradient descent for binary classification Sina Baghal Courtney Paquette S. Vavasis 39 0 0 23 Mar 2020
Weak and Strong Gradient Directions: Explaining Memorization, Generalization, and Hardness of Examples at Scale Piotr Zielinski Shankar Krishnan S. Chatterjee ODL 129 2 0 16 Mar 2020
Interference and Generalization in Temporal Difference Learning Emmanuel Bengio Joelle Pineau Doina Precup 88 61 0 13 Mar 2020