Estimating the Effective Support Size in Constant Query Complexity

Estimating the support size of a distribution is a well-studied problem in statistics. Motivated by the fact that this problem is highly non-robust (as small perturbations in the distributions can drastically affect the support size) and thus hard to estimate, Goldreich [ECCC 2019] studied the query complexity of estimating the -\emph{effective support size} of a distribution , which is equal to the smallest support size of a distribution that is -far in total variation distance from . In his paper, he shows an algorithm in the dual access setting (where we may both receive random samples and query the sampling probability for any ) for a bicriteria approximation, giving an answer in for some values . However, his algorithm has either super-constant query complexity in the support size or super-constant approximation ratio . He then asked if this is necessary, or if it is possible to get a constant-factor approximation in a number of queries independent of the support size. We answer his question by showing that not only is complexity independent of possible for , but also for , that is, that the bicriteria relaxation is not necessary. Specifically, we show an algorithm with query complexity . That is, for any , we output in this complexity a number . We also show that it is possible to solve the approximate version with approximation ratio in complexity . Our algorithm is very simple, and has short lines of pseudocode.
View on arXiv