31
11

Breaking The Dimension Dependence in Sparse Distribution Estimation under Communication Constraints

Abstract

We consider the problem of estimating a dd-dimensional ss-sparse discrete distribution from its samples observed under a bb-bit communication constraint. The best-known previous result on 2\ell_2 estimation error for this problem is O(slog(d/s)n2b)O\left( \frac{s\log\left( {d}/{s}\right)}{n2^b}\right). Surprisingly, we show that when sample size nn exceeds a minimum threshold n(s,d,b)n^*(s, d, b), we can achieve an 2\ell_2 estimation error of O(sn2b)O\left( \frac{s}{n2^b}\right). This implies that when n>n(s,d,b)n>n^*(s, d, b) the convergence rate does not depend on the ambient dimension dd and is the same as knowing the support of the distribution beforehand. We next ask the question: ``what is the minimum n(s,d,b)n^*(s, d, b) that allows dimension-free convergence?''. To upper bound n(s,d,b)n^*(s, d, b), we develop novel localization schemes to accurately and efficiently localize the unknown support. For the non-interactive setting, we show that n(s,d,b)=O(min(d2log2d/2b,s4log2d/2b))n^*(s, d, b) = O\left( \min \left( {d^2\log^2 d}/{2^b}, {s^4\log^2 d}/{2^b}\right) \right). Moreover, we connect the problem with non-adaptive group testing and obtain a polynomial-time estimation scheme when n=Ω~(s4log4d/2b)n = \tilde{\Omega}\left({s^4\log^4 d}/{2^b}\right). This group testing based scheme is adaptive to the sparsity parameter ss, and hence can be applied without knowing it. For the interactive setting, we propose a novel tree-based estimation scheme and show that the minimum sample-size needed to achieve dimension-free convergence can be further reduced to n(s,d,b)=O~(s2log2d/2b)n^*(s, d, b) = \tilde{O}\left( {s^2\log^2 d}/{2^b} \right).

View on arXiv
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.