Revisiting SGD with Increasingly Weighted Averaging: Optimization and
Generalization Perspectives

v1v2v3 (latest)

Revisiting SGD with Increasingly Weighted Averaging: Optimization and Generalization Perspectives

9 March 2020

ArXiv (abs)PDF HTML

Papers citing "Revisiting SGD with Increasingly Weighted Averaging: Optimization and Generalization Perspectives"

15 / 15 papers shown

Title
On exponential convergence of SGD in non-convex over-parametrized learning Xinhai Liu M. Belkin Yu-Shen Liu 70 103 0 06 Nov 2018
A Unified Analysis of Stochastic Momentum Methods for Deep Learning Yan Yan Tianbao Yang Zhe Li Qihang Lin Yi Yang 38 120 0 30 Aug 2018
Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions Zaiyi Chen Zhuoning Yuan Jinfeng Yi Bowen Zhou Enhong Chen Tianbao Yang 51 58 0 20 Aug 2018
$Stochastic subgradient method converges at the rate $O(k^{-1/4})$ on weakly convex functions$ Stochastic subgradient method converges at the rate $O(k^{-1/4})$ on weakly convex functions Damek Davis Dmitriy Drusvyatskiy 77 101 0 08 Feb 2018
Proximally Guided Stochastic Subgradient Method for Nonsmooth, Nonconvex Problems Damek Davis Benjamin Grimmer 53 113 0 12 Jul 2017
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition Hamed Karimi J. Nutini Mark Schmidt 280 1,220 0 16 Aug 2016
Stochastic Variance Reduction for Nonconvex Optimization Sashank J. Reddi Ahmed S. Hefny S. Sra Barnabás Póczós Alex Smola 101 604 0 19 Mar 2016
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 2.2K 194,322 0 10 Dec 2015
Train faster, generalize better: Stability of stochastic gradient descent Moritz Hardt Benjamin Recht Y. Singer 116 1,242 0 03 Sep 2015
Adam: A Method for Stochastic Optimization Diederik P. Kingma Jimmy Ba ODL 1.9K 150,260 0 22 Dec 2014
Deep learning with Elastic Averaging SGD Sixin Zhang A. Choromańska Yann LeCun FedML 96 611 0 20 Dec 2014
Stochastic First- and Zeroth-order Methods for Nonconvex Stochastic Programming Saeed Ghadimi Guanghui Lan ODL 122 1,555 0 22 Sep 2013
A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method Simon Lacoste-Julien Mark Schmidt Francis R. Bach 185 260 0 10 Dec 2012
Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes Ohad Shamir Tong Zhang 153 576 0 08 Dec 2012
Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization Alexander Rakhlin Ohad Shamir Karthik Sridharan 169 768 0 26 Sep 2011