IS-ASGD: Importance Sampling Accelerated Asynchronous SGD on Multi-Core Systems

26 June 2017

Fei Wang

Jun Ye

Abstract

Parallel SGD (PSGD) algorithm has been broadly used to accelerate the stochastic optimization tasks. However its scalability is severely limited by the synchronization between threads. Asynchronous SGD (ASGD) algorithm is then proposed to increase PSGD's scalability by allowing non-synchronized model updates. In practical, lock-free ASGD is preferable since it requires no lock operations on the concurrent update of the global model and thus achieves optimal scalability. It also maintains almost the same convergence bound if certain conditions (convexity, continuity and sparsity) are met. With the success of lock-free ASGD, researchers developed its variance reduction (VR) variants, i.e., VR-integrated lock-free ASGD to achieve superior convergence bound. We noted that the VR techniques that have been studied in lock-free ASGD context are all variance-reduced-gradient based such as SVRG, SAGA, etc. Unfortunately, the estimation of variance-reduced-gradient needs to calculate the full gradient periodically and doubles the computation cost at each iteration which decreases the scalability of ASGD to a very large extent. On the other hand, importance sampling (IS) as another elegant and practical VR technique has not been studied nor implemented in conjunction with lock-free ASGD. One important advantage of IS is that, not like variance-reduced-gradient VR algorithms, IS algorithm achieves the goal of VR through weighted sampling which does not introduce any extra on-line computation and thus preserves the original scalability of ASGD. We are thus motivated to study the application of IS in lock-free ASGD and propose our IS-ASGD algorithm to achieve a superior convergence bound while maintaining the original high scalability of ASGD. We also conduct experimental evaluations that verify the effectiveness of IS-ASGD algorithm with datasets that are popularly adopted in relative researches.

View on arXiv

Comments on this paper