IS-ASGD: Importance Sampling Based Variance Reduction for ASGD

Variance reduction (VR) algorithms for convergence acceleration of stochastic gradient descent (SGD) have been developed with great efforts recently. Its two variants, stochastic variance-reduced-gradient (SVRG) and importance sampling (IS) have achieved impressive progresses. Meanwhile, asynchronous SGD (ASGD) is becoming more important due to the ever-increasing scale of optimization problems and thus applying VR in ASGD to accelerate its convergence procedure has attracted lots of researches. Unlike well-studied SVRG-styled VR algorithms which need at least double computation cost per iteration and periodical computation of the true gradient, IS achieves improved convergence bound with no extra on-line computation. This advantage makes it very suitable for ASGD which targeting at the best performance and scalability. In considering the fact that the research of applying IS in ASGD is still missing, this paper studies the application of IS in ASGD for efficient variance reduction, namely, IS-ASGD algorithm. We prove in theoretical that IS-ASGD achieves superior convergence bound than ASGD which accelerates the training significantly in practical deployment. The experimental evaluation conducted validates the effectiveness of IS-ASGD. We also make our evaluation source code accessible publicly on github.
View on arXiv