ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.15654
36
0

The Effectiveness of Local Updates for Decentralized Learning under Data Heterogeneity

23 March 2024
Tongle Wu
Ying Sun
ArXivPDFHTML
Abstract

We revisit two fundamental decentralized optimization methods, Decentralized Gradient Tracking (DGT) and Decentralized Gradient Descent (DGD), with multiple local updates. We consider two settings and demonstrate that incorporating K>1K > 1K>1 local update steps can reduce communication complexity. Specifically, for μ\muμ-strongly convex and LLL-smooth loss functions, we proved that local DGT achieves communication complexity O~(LμK+δμ(1−ρ)+ρ(1−ρ)2⋅L+δμ)\tilde{\mathcal{O}} \Big(\frac{L}{\mu K} + \frac{\delta}{\mu (1 - \rho)} + \frac{\rho }{(1 - \rho)^2} \cdot \frac{L+ \delta}{\mu}\Big)O~(μKL​+μ(1−ρ)δ​+(1−ρ)2ρ​⋅μL+δ​), where ρ\rhoρ measures the network connectivity and δ\deltaδ measures the second-order heterogeneity of the local loss. Our result reveals the tradeoff between communication and computation and shows increasing KKK can effectively reduce communication costs when the data heterogeneity is low and the network is well-connected. We then consider the over-parameterization regime where the local losses share the same minimums, we proved that employing local updates in DGD, even without gradient correction, can yield a similar effect as DGT in reducing communication complexity. Numerical experiments validate our theoretical results.

View on arXiv
Comments on this paper