ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1506.07216
21
174

Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality

24 June 2015
M. Braverman
A. Garg
Tengyu Ma
Huy Le Nguyen
David P. Woodruff
    FedML
ArXivPDFHTML
Abstract

We study the tradeoff between the statistical error and communication cost of distributed statistical estimation problems in high dimensions. In the distributed sparse Gaussian mean estimation problem, each of the mmm machines receives nnn data points from a ddd-dimensional Gaussian distribution with unknown mean θ\thetaθ which is promised to be kkk-sparse. The machines communicate by message passing and aim to estimate the mean θ\thetaθ. We provide a tight (up to logarithmic factors) tradeoff between the estimation error and the number of bits communicated between the machines. This directly leads to a lower bound for the distributed \textit{sparse linear regression} problem: to achieve the statistical minimax error, the total communication is at least Ω(min⁡{n,d}m)\Omega(\min\{n,d\}m)Ω(min{n,d}m), where nnn is the number of observations that each machine receives and ddd is the ambient dimension. These lower results improve upon [Sha14,SD'14] by allowing multi-round iterative communication model. We also give the first optimal simultaneous protocol in the dense case for mean estimation. As our main technique, we prove a \textit{distributed data processing inequality}, as a generalization of usual data processing inequalities, which might be of independent interest and useful for other problems.

View on arXiv
Comments on this paper