ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1312.1666
58
237

Semi-Stochastic Gradient Descent Methods

5 December 2013
Jakub Konecný
Peter Richtárik
    ODL
ArXivPDFHTML
Abstract

In this paper we study the problem of minimizing the average of a large number (nnn) of smooth convex loss functions. We propose a new method, S2GD (Semi-Stochastic Gradient Descent), which runs for one or several epochs in each of which a single full gradient and a random number of stochastic gradients is computed, following a geometric law. The total work needed for the method to output an ε\varepsilonε-accurate solution in expectation, measured in the number of passes over data, or equivalently, in units equivalent to the computation of a single gradient of the loss, is O((κ/n)log⁡(1/ε))O((\kappa/n)\log(1/\varepsilon))O((κ/n)log(1/ε)), where κ\kappaκ is the condition number. This is achieved by running the method for O(log⁡(1/ε))O(\log(1/\varepsilon))O(log(1/ε)) epochs, with a single gradient evaluation and O(κ)O(\kappa)O(κ) stochastic gradient evaluations in each. The SVRG method of Johnson and Zhang arises as a special case. If our method is limited to a single epoch only, it needs to evaluate at most O((κ/ε)log⁡(1/ε))O((\kappa/\varepsilon)\log(1/\varepsilon))O((κ/ε)log(1/ε)) stochastic gradients. In contrast, SVRG requires O(κ/ε2)O(\kappa/\varepsilon^2)O(κ/ε2) stochastic gradients. To illustrate our theoretical results, S2GD only needs the workload equivalent to about 2.1 full gradient evaluations to find an 10−610^{-6}10−6-accurate solution for a problem with n=109n=10^9n=109 and κ=103\kappa=10^3κ=103.

View on arXiv
Comments on this paper