ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2105.08023
10
32

Removing Data Heterogeneity Influence Enhances Network Topology Dependence of Decentralized SGD

17 May 2021
Kun Yuan
Sulaiman A. Alghunaim
Xinmeng Huang
ArXivPDFHTML
Abstract

We consider the decentralized stochastic optimization problems, where a network of nnn nodes, each owning a local cost function, cooperate to find a minimizer of the globally-averaged cost. A widely studied decentralized algorithm for this problem is decentralized SGD (D-SGD), in which each node averages only with its neighbors. D-SGD is efficient in single-iteration communication, but it is very sensitive to the network topology. For smooth objective functions, the transient stage (which measures the number of iterations the algorithm has to experience before achieving the linear speedup stage) of D-SGD is on the order of Ω(n/(1−β)2){\Omega}(n/(1-\beta)^2)Ω(n/(1−β)2) and Ω(n3/(1−β)4)\Omega(n^3/(1-\beta)^4)Ω(n3/(1−β)4) for strongly and generally convex cost functions, respectively, where 1−β∈(0,1)1-\beta \in (0,1)1−β∈(0,1) is a topology-dependent quantity that approaches 000 for a large and sparse network. Hence, D-SGD suffers from slow convergence for large and sparse networks. In this work, we study the non-asymptotic convergence property of the D2^22/Exact-diffusion algorithm. By eliminating the influence of data heterogeneity between nodes, D2^22/Exact-diffusion is shown to have an enhanced transient stage that is on the order of Ω~(n/(1−β))\tilde{\Omega}(n/(1-\beta))Ω~(n/(1−β)) and Ω(n3/(1−β)2)\Omega(n^3/(1-\beta)^2)Ω(n3/(1−β)2) for strongly and generally convex cost functions, respectively. Moreover, when D2^22/Exact-diffusion is implemented with gradient accumulation and multi-round gossip communications, its transient stage can be further improved to Ω~(1/(1−β)12)\tilde{\Omega}(1/(1-\beta)^{\frac{1}{2}})Ω~(1/(1−β)21​) and Ω~(n/(1−β))\tilde{\Omega}(n/(1-\beta))Ω~(n/(1−β)) for strongly and generally convex cost functions, respectively. These established results for D2^22/Exact-Diffusion have the best (i.e., weakest) dependence on network topology to our knowledge compared to existing decentralized algorithms. We also conduct numerical simulations to validate our theories.

View on arXiv
Comments on this paper