308
v1v2 (latest)

Convex SGD: Generalization Without Early Stopping

Main:24 Pages
1 Figures
Bibliography:3 Pages
1 Tables
Abstract

We consider the generalization error associated with stochastic gradient descent on a smooth convex function over a compact set. We show the first bound on the generalization error that vanishes when the number of iterations TT and the dataset size nn go to zero at arbitrary rates; our bound scales as O~(1/T+1/n)\tilde{O}(1/\sqrt{T} + 1/\sqrt{n}) with step-size αt=1/t\alpha_t = 1/\sqrt{t}. In particular, strong convexity is not needed for stochastic gradient descent to generalize well.

View on arXiv
Comments on this paper