ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.10018
32
396

Momentum-Based Variance Reduction in Non-Convex SGD

24 May 2019
Ashok Cutkosky
Francesco Orabona
    ODL
ArXivPDFHTML
Abstract

Variance reduction has emerged in recent years as a strong competitor to stochastic gradient descent in non-convex problems, providing the first algorithms to improve upon the converge rate of stochastic gradient descent for finding first-order critical points. However, variance reduction techniques typically require carefully tuned learning rates and willingness to use excessively large "mega-batches" in order to achieve their improved results. We present a new algorithm, STORM, that does not require any batches and makes use of adaptive learning rates, enabling simpler implementation and less hyperparameter tuning. Our technique for removing the batches uses a variant of momentum to achieve variance reduction in non-convex optimization. On smooth losses FFF, STORM finds a point x\boldsymbol{x}x with E[∥∇F(x)∥]≤O(1/T+σ1/3/T1/3)\mathbb{E}[\|\nabla F(\boldsymbol{x})\|]\le O(1/\sqrt{T}+\sigma^{1/3}/T^{1/3})E[∥∇F(x)∥]≤O(1/T​+σ1/3/T1/3) in TTT iterations with σ2\sigma^2σ2 variance in the gradients, matching the optimal rate but without requiring knowledge of σ\sigmaσ.

View on arXiv
Comments on this paper