33
1

Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson-Romberg Extrapolation

Abstract

We address the problem of solving strongly convex and smooth minimization problems using stochastic gradient descent (SGD) algorithm with a constant step size. Previous works suggested to combine the Polyak-Ruppert averaging procedure with the Richardson-Romberg extrapolation to reduce the asymptotic bias of SGD at the expense of a mild increase of the variance. We significantly extend previous results by providing an expansion of the mean-squared error of the resulting estimator with respect to the number of iterations nn. We show that the root mean-squared error can be decomposed into the sum of two terms: a leading one of order O(n1/2)\mathcal{O}(n^{-1/2}) with explicit dependence on a minimax-optimal asymptotic covariance matrix, and a second-order term of order O(n3/4)\mathcal{O}(n^{-3/4}), where the power 3/43/4 is best known. We also extend this result to the higher-order moment bounds. Our analysis relies on the properties of the SGD iterates viewed as a time-homogeneous Markov chain. In particular, we establish that this chain is geometrically ergodic with respect to a suitably defined weighted Wasserstein semimetric.

View on arXiv
@article{sheshukova2025_2410.05106,
  title={ Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson-Romberg Extrapolation },
  author={ Marina Sheshukova and Denis Belomestny and Alain Durmus and Eric Moulines and Alexey Naumov and Sergey Samsonov },
  journal={arXiv preprint arXiv:2410.05106},
  year={ 2025 }
}
Comments on this paper