32
1

Convex SGD: Generalization Without Early Stopping

Abstract

We consider the generalization error associated with stochastic gradient descent on a smooth convex function over a compact set. We show the first bound on the generalization error that vanishes when the number of iterations TT and the dataset size nn go to zero at arbitrary rates; our bound scales as O~(1/T+1/n)\tilde{O}(1/\sqrt{T} + 1/\sqrt{n}) with step-size αt=1/t\alpha_t = 1/\sqrt{t}. In particular, strong convexity is not needed for stochastic gradient descent to generalize well.

View on arXiv
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.