60
2

p\ell_p-Regression in the Arbitrary Partition Model of Communication

Abstract

We consider the randomized communication complexity of the distributed p\ell_p-regression problem in the coordinator model, for p(0,2]p\in (0,2]. In this problem, there is a coordinator and ss servers. The ii-th server receives Ai{M,M+1,,M}n×dA^i\in\{-M, -M+1, \ldots, M\}^{n\times d} and bi{M,M+1,,M}nb^i\in\{-M, -M+1, \ldots, M\}^n and the coordinator would like to find a (1+ϵ)(1+\epsilon)-approximate solution to minxRn(iAi)x(ibi)p\min_{x\in\mathbb{R}^n} \|(\sum_i A^i)x - (\sum_i b^i)\|_p. Here Mpoly(nd)M \leq \mathrm{poly}(nd) for convenience. This model, where the data is additively shared across servers, is commonly referred to as the arbitrary partition model. We obtain significantly improved bounds for this problem. For p=2p = 2, i.e., least squares regression, we give the first optimal bound of Θ~(sd2+sd/ϵ)\tilde{\Theta}(sd^2 + sd/\epsilon) bits. For p(1,2)p \in (1,2),we obtain an O~(sd2/ϵ+sd/poly(ϵ))\tilde{O}(sd^2/\epsilon + sd/\mathrm{poly}(\epsilon)) upper bound. Notably, for dd sufficiently large, our leading order term only depends linearly on 1/ϵ1/\epsilon rather than quadratically. We also show communication lower bounds of Ω(sd2+sd/ϵ2)\Omega(sd^2 + sd/\epsilon^2) for p(0,1]p\in (0,1] and Ω(sd2+sd/ϵ)\Omega(sd^2 + sd/\epsilon) for p(1,2]p\in (1,2]. Our bounds considerably improve previous bounds due to (Woodruff et al. COLT, 2013) and (Vempala et al., SODA, 2020).

View on arXiv
Comments on this paper