52
52

Distributed Non-Convex First-Order Optimization and Information Processing: Lower Complexity Bounds and Rate Optimal Algorithms

Abstract

We consider a class of distributed non-convex optimization problems often arise in modern signal and information processing applications, in which a number of agents connected by a network G\mathcal{G} collectively optimize a sum of smooth (possibly non-convex) local objective functions. We address the following question: for a class of unconstrained problems, what is the fastest rate that any distributed algorithm can achieve when the agents only use local gradient information and interacting with their immediate neighbors, and how to achieve those rates. First, for a class of problems with Lipschitz continuous gradients for all local functions, and for a class of undirected and unweighted graphs, we develop a lower bound analysis that identifies difficult problem instances for a class of properly defined distributed first-order methods. We show that for this class of algorithms, in the worst-case it takes at least O(1/ξ(G)×Lˉ/ϵ)\mathcal{O}(1/\sqrt{\xi(\mathcal{G})} \times \bar{L} /{\epsilon}) iterations to achieve certain ϵ\epsilon-solution, where ξ(G)\xi(\mathcal{G}) represents the spectral gap of the graph Laplacian matrix, and Lˉ\bar{L} is the averaged Lipschitz constants of the gradients of local functions. Second, we propose optimal methods whose rates precisely match the lower bounds (up to a ploylog factor). The key to our algorithm design is to properly leverage the classical polynomial filtering methods with modern first-order optimization techniques. To the best of our knowledge, this is the first time that lower rate bounds and optimal methods have been developed for distributed non-convex optimization problems. Our results provide guidelines for the future design of distributed optimization algorithms, convex and non-convex alike.

View on arXiv
Comments on this paper