The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning

18 December 2017

Papers citing "The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning"

26 / 76 papers shown

Title
Logarithmic Pruning is All You Need Laurent Orseau Marcus Hutter Omar Rivasplata 28 88 0 22 Jun 2020
SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation Robert Mansel Gower Othmane Sebbouh Nicolas Loizou 25 74 0 18 Jun 2020
Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent Yunwen Lei Yiming Ying MLT 35 126 0 15 Jun 2020
An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias Lu Yu Krishnakumar Balasubramanian S. Volgushev Murat A. Erdogdu 35 50 0 14 Jun 2020
A Unified Theory of Decentralized SGD with Changing Topology and Local Updates Anastasia Koloskova Nicolas Loizou Sadra Boreiri Martin Jaggi Sebastian U. Stich FedML 41 493 0 23 Mar 2020
On the Convergence of Nesterov's Accelerated Gradient Method in Stochastic Settings Mahmoud Assran Michael G. Rabbat 14 59 0 27 Feb 2020
Understanding and Mitigating the Tradeoff Between Robustness and Accuracy Aditi Raghunathan Sang Michael Xie Fanny Yang John C. Duchi Percy Liang AAML 48 223 0 25 Feb 2020
Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence Nicolas Loizou Sharan Vaswani I. Laradji Simon Lacoste-Julien 27 181 0 24 Feb 2020
Zeroth-Order Algorithms for Nonconvex Minimax Problems with Improved Complexities Zhongruo Wang Krishnakumar Balasubramanian Shiqian Ma Meisam Razaviyayn 19 25 0 22 Jan 2020
An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices Xiaolong Ma Wei Niu Tianyun Zhang Sijia Liu Sheng Lin ... Xiang Chen Jian Tang Kaisheng Ma Bin Ren Yanzhi Wang 35 27 0 20 Jan 2020
Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well Vipul Gupta S. Serrano D. DeCoste MoMe 38 55 0 07 Jan 2020
The Role of Neural Network Activation Functions Rahul Parhi Robert D. Nowak 29 12 0 05 Oct 2019
The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication Sebastian U. Stich Sai Praneeth Karimireddy FedML 25 20 0 11 Sep 2019
PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices Xiaolong Ma Fu-Ming Guo Wei Niu Xue Lin Jian Tang Kaisheng Ma Bin Ren Yanzhi Wang CVBM 27 173 0 06 Sep 2019
Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization Xinyan Li Qilong Gu Yingxue Zhou Tiancong Chen A. Banerjee ODL 42 51 0 24 Jul 2019
Unified Optimal Analysis of the (Stochastic) Gradient Method Sebastian U. Stich 26 112 0 09 Jul 2019
Does Learning Require Memorization? A Short Tale about a Long Tail Vitaly Feldman TDI 52 482 0 12 Jun 2019
Shallow Neural Networks for Fluid Flow Reconstruction with Limited Sensors N. Benjamin Erichson L. Mathelin Z. Yao Steven L. Brunton Michael W. Mahoney J. Nathan Kutz AI4CE 27 34 0 20 Feb 2019
Reconciling modern machine learning practice and the bias-variance trade-off M. Belkin Daniel J. Hsu Siyuan Ma Soumik Mandal 60 1,610 0 28 Dec 2018
Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron Sharan Vaswani Francis R. Bach Mark W. Schmidt 30 296 0 16 Oct 2018
Stochastic (Approximate) Proximal Point Methods: Convergence, Optimality, and Adaptivity Hilal Asi John C. Duchi 21 123 0 12 Oct 2018
The Effect of Network Width on the Performance of Large-batch Training Lingjiao Chen Hongyi Wang Jinman Zhao Dimitris Papailiopoulos Paraschos Koutris 21 22 0 11 Jun 2018
Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate Mor Shpigel Nacson Nathan Srebro Daniel Soudry FedML MLT 32 97 0 05 Jun 2018
Stochastic Gradient/Mirror Descent: Minimax Optimality and Implicit Regularization Navid Azizan B. Hassibi 24 61 0 04 Jun 2018
Local SGD Converges Fast and Communicates Little Sebastian U. Stich FedML 79 1,044 0 24 May 2018
A Proximal Stochastic Gradient Method with Progressive Variance Reduction Lin Xiao Tong Zhang ODL 93 737 0 19 Mar 2014