ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2105.09080
29
41

Accelerating Gossip SGD with Periodic Global Averaging

19 May 2021
Yiming Chen
Kun Yuan
Yingya Zhang
Pan Pan
Yinghui Xu
W. Yin
ArXivPDFHTML
Abstract

Communication overhead hinders the scalability of large-scale distributed training. Gossip SGD, where each node averages only with its neighbors, is more communication-efficient than the prevalent parallel SGD. However, its convergence rate is reversely proportional to quantity 1−β1-\beta1−β which measures the network connectivity. On large and sparse networks where 1−β→01-\beta \to 01−β→0, Gossip SGD requires more iterations to converge, which offsets against its communication benefit. This paper introduces Gossip-PGA, which adds Periodic Global Averaging into Gossip SGD. Its transient stage, i.e., the iterations required to reach asymptotic linear speedup stage, improves from Ω(β4n3/(1−β)4)\Omega(\beta^4 n^3/(1-\beta)^4)Ω(β4n3/(1−β)4) to Ω(β4n3H4)\Omega(\beta^4 n^3 H^4)Ω(β4n3H4) for non-convex problems. The influence of network topology in Gossip-PGA can be controlled by the averaging period HHH. Its transient-stage complexity is also superior to Local SGD which has order Ω(n3H4)\Omega(n^3 H^4)Ω(n3H4). Empirical results of large-scale training on image classification (ResNet50) and language modeling (BERT) validate our theoretical findings.

View on arXiv
Comments on this paper