ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1708.02728
24
63

Optimal Identity Testing with High Probability

9 August 2017
Ilias Diakonikolas
Themis Gouleakis
John Peebles
Eric Price
ArXivPDFHTML
Abstract

We study the problem of testing identity against a given distribution with a focus on the high confidence regime. More precisely, given samples from an unknown distribution ppp over nnn elements, an explicitly given distribution qqq, and parameters 0<ϵ,δ<10< \epsilon, \delta < 10<ϵ,δ<1, we wish to distinguish, {\em with probability at least 1−δ1-\delta1−δ}, whether the distributions are identical versus ε\varepsilonε-far in total variation distance. Most prior work focused on the case that δ=Ω(1)\delta = \Omega(1)δ=Ω(1), for which the sample complexity of identity testing is known to be Θ(n/ϵ2)\Theta(\sqrt{n}/\epsilon^2)Θ(n​/ϵ2). Given such an algorithm, one can achieve arbitrarily small values of δ\deltaδ via black-box amplification, which multiplies the required number of samples by Θ(log⁡(1/δ))\Theta(\log(1/\delta))Θ(log(1/δ)). We show that black-box amplification is suboptimal for any δ=o(1)\delta = o(1)δ=o(1), and give a new identity tester that achieves the optimal sample complexity. Our new upper and lower bounds show that the optimal sample complexity of identity testing is \[ \Theta\left( \frac{1}{\epsilon^2}\left(\sqrt{n \log(1/\delta)} + \log(1/\delta) \right)\right) \] for any n,εn, \varepsilonn,ε, and δ\deltaδ. For the special case of uniformity testing, where the given distribution is the uniform distribution UnU_nUn​ over the domain, our new tester is surprisingly simple: to test whether p=Unp = U_np=Un​ versus dTV(p,Un)≥εd_{\mathrm TV}(p, U_n) \geq \varepsilondTV​(p,Un​)≥ε, we simply threshold dTV(p^,Un)d_{\mathrm TV}(\widehat{p}, U_n)dTV​(p​,Un​), where p^\widehat{p}p​ is the empirical probability distribution. The fact that this simple "plug-in" estimator is sample-optimal is surprising, even in the constant δ\deltaδ case. Indeed, it was believed that such a tester would not attain sublinear sample complexity even for constant values of ε\varepsilonε and δ\deltaδ.

View on arXiv
Comments on this paper