ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2201.12719
33
6

Faster Convergence of Local SGD for Over-Parameterized Models

30 January 2022
Tiancheng Qin
S. Rasoul Etesami
César A. Uribe
    FedML
ArXivPDFHTML
Abstract

Modern machine learning architectures are often highly expressive. They are usually over-parameterized and can interpolate the data by driving the empirical loss close to zero. We analyze the convergence of Local SGD (or FedAvg) for such over-parameterized models in the heterogeneous data setting and improve upon the existing literature by establishing the following convergence rates. For general convex loss functions, we establish an error bound of \O(1/T)\O(1/T)\O(1/T) under a mild data similarity assumption and an error bound of \O(K/T)\O(K/T)\O(K/T) otherwise, where KKK is the number of local steps and TTT is the total number of iterations. For non-convex loss functions we prove an error bound of \O(K/T)\O(K/T)\O(K/T). These bounds improve upon the best previous bound of \O(1/nT)\O(1/\sqrt{nT})\O(1/nT​) in both cases, where nnn is the number of nodes, when no assumption on the model being over-parameterized is made. We complete our results by providing problem instances in which our established convergence rates are tight to a constant factor with a reasonably small stepsize. Finally, we validate our theoretical results by performing large-scale numerical experiments that reveal the convergence behavior of Local SGD for practical over-parameterized deep learning models, in which the \O(1/T)\O(1/T)\O(1/T) convergence rate of Local SGD is clearly shown.

View on arXiv
Comments on this paper