ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.15932
24
2

Online Robust Mean Estimation

24 October 2023
Daniel M. Kane
Ilias Diakonikolas
Hanshen Xiao
Sihan Liu
    OOD
ArXivPDFHTML
Abstract

We study the problem of high-dimensional robust mean estimation in an online setting. Specifically, we consider a scenario where nnn sensors are measuring some common, ongoing phenomenon. At each time step t=1,2,…,Tt=1,2,\ldots,Tt=1,2,…,T, the ithi^{th}ith sensor reports its readings xt(i)x^{(i)}_txt(i)​ for that time step. The algorithm must then commit to its estimate μt\mu_tμt​ for the true mean value of the process at time ttt. We assume that most of the sensors observe independent samples from some common distribution XXX, but an ϵ\epsilonϵ-fraction of them may instead behave maliciously. The algorithm wishes to compute a good approximation μ\muμ to the true mean μ∗:=E[X]\mu^\ast := \mathbf{E}[X]μ∗:=E[X]. We note that if the algorithm is allowed to wait until time TTT to report its estimate, this reduces to the well-studied problem of robust mean estimation. However, the requirement that our algorithm produces partial estimates as the data is coming in substantially complicates the situation. We prove two main results about online robust mean estimation in this model. First, if the uncorrupted samples satisfy the standard condition of (ϵ,δ)(\epsilon,\delta)(ϵ,δ)-stability, we give an efficient online algorithm that outputs estimates μt\mu_tμt​, t∈[T],t \in [T],t∈[T], such that with high probability it holds that ∥μ−μ∗∥2=O(δlog⁡(T))\|\mu-\mu^\ast\|_2 = O(\delta \log(T))∥μ−μ∗∥2​=O(δlog(T)), where μ=(μt)t∈[T]\mu = (\mu_t)_{t \in [T]}μ=(μt​)t∈[T]​. We note that this error bound is nearly competitive with the best offline algorithms, which would achieve ℓ2\ell_2ℓ2​-error of O(δ)O(\delta)O(δ). Our second main result shows that with additional assumptions on the input (most notably that XXX is a product distribution) there are inefficient algorithms whose error does not depend on TTT at all.

View on arXiv
Comments on this paper