ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.15387
33
0

Multi-Timescale Gradient Sliding for Distributed Optimization

18 June 2025
Junhui Zhang
Patrick Jaillet
ArXiv (abs)PDFHTML
Main:28 Pages
5 Figures
Bibliography:4 Pages
1 Tables
Appendix:9 Pages
Abstract

We propose two first-order methods for convex, non-smooth, distributed optimization problems, hereafter called Multi-Timescale Gradient Sliding (MT-GS) and its accelerated variant (AMT-GS). Our MT-GS and AMT-GS can take advantage of similarities between (local) objectives to reduce the communication rounds, are flexible so that different subsets (of agents) can communicate at different, user-picked rates, and are fully deterministic. These three desirable features are achieved through a block-decomposable primal-dual formulation, and a multi-timescale variant of the sliding method introduced in Lan et al. (2020), Lan (2016), where different dual blocks are updated at potentially different rates.To find an ϵ\epsilonϵ-suboptimal solution, the complexities of our algorithms achieve optimal dependency on ϵ\epsilonϵ: MT-GS needs O(r‾A/ϵ)O(\overline{r}A/\epsilon)O(rA/ϵ) communication rounds and O(r‾/ϵ2)O(\overline{r}/\epsilon^2)O(r/ϵ2) subgradient steps for Lipchitz objectives, and AMT-GS needs O(r‾A/ϵμ)O(\overline{r}A/\sqrt{\epsilon\mu})O(rA/ϵμ​) communication rounds and O(r‾/(ϵμ))O(\overline{r}/(\epsilon\mu))O(r/(ϵμ)) subgradient steps if the objectives are also μ\muμ-strongly convex. Here, r‾\overline{r}r measures the ``average rate of updates'' for dual blocks, and AAA measures similarities between (subgradients of) local functions. In addition, the linear dependency of communication rounds on AAA is optimal (Arjevani and Shamir 2015), thereby providing a positive answer to the open question whether such dependency is achievable for non-smooth objectives (Arjevani and Shamir 2015).

View on arXiv
Comments on this paper