ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2208.01197
13
0

Bias Reduction for Sum Estimation

2 August 2022
T. Eden
Jakob Baek Tejs Houen
Shyam Narayanan
Will Rosenbaum
Jakub Tvetek
ArXivPDFHTML
Abstract

In classical statistics and distribution testing, it is often assumed that elements can be sampled from some distribution PPP, and that when an element xxx is sampled, the probability PPP of sampling xxx is also known. Recent work in distribution testing has shown that many algorithms are robust in the sense that they still produce correct output if the elements are drawn from any distribution QQQ that is sufficiently close to PPP. This phenomenon raises interesting questions: under what conditions is a "noisy" distribution QQQ sufficient, and what is the algorithmic cost of coping with this noise? We investigate these questions for the problem of estimating the sum of a multiset of NNN real values x1,…,xNx_1, \ldots, x_Nx1​,…,xN​. This problem is well-studied in the statistical literature in the case P=QP = QP=Q, where the Hansen-Hurwitz estimator is frequently used. We assume that for some known distribution PPP, values are sampled from a distribution QQQ that is pointwise close to PPP. For every positive integer kkk we define an estimator ζk\zeta_kζk​ for μ=∑ixi\mu = \sum_i x_iμ=∑i​xi​ whose bias is proportional to γk\gamma^kγk (where our ζ1\zeta_1ζ1​ reduces to the classical Hansen-Hurwitz estimator). As a special case, we show that if QQQ is pointwise γ\gammaγ-close to uniform and all xi∈{0,1}x_i \in \{0, 1\}xi​∈{0,1}, for any ϵ>0\epsilon > 0ϵ>0, we can estimate μ\muμ to within additive error ϵN\epsilon NϵN using m=Θ(N1−1k/ϵ2/k)m = \Theta({N^{1-\frac{1}{k}} / \epsilon^{2/k}})m=Θ(N1−k1​/ϵ2/k) samples, where k=⌈(log⁡ϵ)/(log⁡γ)⌉k = \left\lceil (\log \epsilon)/(\log \gamma)\right\rceilk=⌈(logϵ)/(logγ)⌉. We show that this sample complexity is essentially optimal. Our bounds show that the sample complexity need not vary uniformly with the desired error parameter ϵ\epsilonϵ: for some values of ϵ\epsilonϵ, perturbations in its value have no asymptotic effect on the sample complexity, while for other values, any decrease in its value results in an asymptotically larger sample complexity.

View on arXiv
Comments on this paper