SVAG: Stochastic Variance Adjusted Gradient Descent and Biased
Stochastic Gradients
We examine biased gradient updates in variance reduced stochastic gradient methods. For this purpose we introduce SVAG, a SAG/SAGA-like method with adjustable bias. SVAG is analyzed under smoothness assumptions and we provide step-size conditions for convergence that match or improve on previously known conditions for SAG and SAGA. The analysis highlights a step-size requirement difference between when SVAG is applied to cocoercive operators and when applied to gradients of smooth functions, a difference not present in ordinary gradient descent. This difference is verified with numerical experiments. A variant of SVAG that adaptively selects the bias is presented and compared numerically to SVAG on a set of classification problems. The adaptive SVAG frequently performs among the best and always improves on the worst-case performance of the non-adaptive variant.
View on arXiv