30
56

Distributed Optimization Based on Gradient-tracking Revisited: Enhancing Convergence Rate via Surrogation

Abstract

We study distributed multiagent optimization over (directed, time-varying) graphs. We consider the minimization of F+GF+G subject to convex constraints, where FF is the smooth strongly convex sum of the agent's losses and GG is a nonsmooth convex function. We build on the SONATA algorithm: the algorithm employs the use of surrogate objective functions in the agents' subproblems (going thus beyond linearization, such as proximal-gradient) coupled with a perturbed (push-sum) consensus mechanism that aims to track locally the gradient of FF. SONATA achieves precision ϵ>0\epsilon>0 on the objective value in O(κglog(1/ϵ))\mathcal{O}(\kappa_g \log(1/\epsilon)) gradient computations at each node and O~(κg(1ρ)1/2log(1/ϵ))\tilde{\mathcal{O}}\big(\kappa_g (1-\rho)^{-1/2} \log(1/\epsilon)\big) communication steps, where κg\kappa_g is the condition number of FF and ρ\rho characterizes the connectivity of the network. This is the first linear rate result for distributed composite optimization; it also improves on existing (non-accelerated) schemes just minimizing FF, whose rate depends on much larger quantities than κg\kappa_g (e.g., the worst-case condition number among the agents). When considering in particular empirical risk minimization problems with statistically similar data across the agents, SONATA employing high-order surrogates achieves precision ϵ>0\epsilon>0 in O((β/μ)log(1/ϵ))\mathcal{O}\big((\beta/\mu) \log(1/\epsilon)\big) iterations and O~((β/μ)(1ρ)1/2log(1/ϵ))\tilde{\mathcal{O}}\big((\beta/\mu) (1-\rho)^{-1/2} \log(1/\epsilon)\big) communication steps, where β\beta measures the degree of similarity of the agents' losses and μ\mu is the strong convexity constant of FF. Therefore, when β/μ<κg\beta/\mu < \kappa_g, the use of high-order surrogates yields provably faster rates than what achievable by first-order models; this is without exchanging any Hessian matrix over the network.

View on arXiv
Comments on this paper