Distributed Sparse Normal Means Estimation with Sublinear Communication

5 February 2021

Abstract

We consider the problem of sparse normal means estimation in a distributed setting with communication constraints. We assume there are $M$ machines, each holding $d$ -dimensional observations of a $K$ -sparse vector $\mu$ corrupted by additive Gaussian noise. The $M$ machines are connected in a star topology to a fusion center, whose goal is to estimate the vector $\mu$ with a low communication budget. Previous works have shown that to achieve the centralized minimax rate for the $\ell_2$ risk, the total communication must be high - at least linear in the dimension $d$ . This phenomenon occurs, however, at very weak signals. We show that at signal-to-noise ratios (SNRs) that are sufficiently high - but not enough for recovery by any individual machine - the support of $\mu$ can be correctly recovered with significantly less communication. Specifically, we present two algorithms for distributed estimation of a sparse mean vector corrupted by either Gaussian or sub-Gaussian noise. We then prove that above certain SNR thresholds, with high probability, these algorithms recover the correct support with total communication that is sublinear in the dimension $d$ . Furthermore, the communication decreases exponentially as a function of signal strength. If in addition $KM\ll \tfrac{d}{\log d}$ , then with an additional round of sublinear communication, our algorithms achieve the centralized rate for the $\ell_2$ risk. Finally, we present simulations that illustrate the performance of our algorithms in different parameter regimes.

View on arXiv

Comments on this paper