In classical statistics and distribution testing, it is often assumed that elements can be sampled from some distribution , and that when an element is sampled, the probability of sampling is also known. Recent work in distribution testing has shown that many algorithms are robust in the sense that they still produce correct output if the elements are drawn from any distribution that is sufficiently close to . This phenomenon raises interesting questions: under what conditions is a "noisy" distribution sufficient, and what is the algorithmic cost of coping with this noise? We investigate these questions for the problem of estimating the sum of a multiset of real values . This problem is well-studied in the statistical literature in the case , where the Hansen-Hurwitz estimator is frequently used. We assume that for some known distribution , values are sampled from a distribution that is pointwise close to . For every positive integer we define an estimator for whose bias is proportional to (where our reduces to the classical Hansen-Hurwitz estimator). As a special case, we show that if is pointwise -close to uniform and all , for any , we can estimate to within additive error using samples, where . We show that this sample complexity is essentially optimal. Our bounds show that the sample complexity need not vary uniformly with the desired error parameter : for some values of , perturbations in its value have no asymptotic effect on the sample complexity, while for other values, any decrease in its value results in an asymptotically larger sample complexity.
View on arXiv