ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.06069
74
3

Sequential algorithms for testing identity and closeness of distributions

12 May 2022
Omar Fawzi
Nicolas Flammarion
Aurélien Garivier
Aadil Oufkir
ArXivPDFHTML
Abstract

What advantage do \emph{sequential} procedures provide over batch algorithms for testing properties of unknown distributions? Focusing on the problem of testing whether two distributions D1\mathcal{D}_1D1​ and D2\mathcal{D}_2D2​ on {1,…,n}\{1,\dots, n\}{1,…,n} are equal or ϵ\epsilonϵ-far, we give several answers to this question. We show that for a small alphabet size nnn, there is a sequential algorithm that outperforms any batch algorithm by a factor of at least 444 in terms sample complexity. For a general alphabet size nnn, we give a sequential algorithm that uses no more samples than its batch counterpart, and possibly fewer if the actual distance TV(D1,D2)TV(\mathcal{D}_1, \mathcal{D}_2)TV(D1​,D2​) between D1\mathcal{D}_1D1​ and D2\mathcal{D}_2D2​ is larger than ϵ\epsilonϵ. As a corollary, letting ϵ\epsilonϵ go to 000, we obtain a sequential algorithm for testing closeness when no a priori bound on TV(D1,D2)TV(\mathcal{D}_1, \mathcal{D}_2)TV(D1​,D2​) is given that has a sample complexity O~(n2/3TV(D1,D2)4/3)\tilde{\mathcal{O}}(\frac{n^{2/3}}{TV(\mathcal{D}_1, \mathcal{D}_2)^{4/3}})O~(TV(D1​,D2​)4/3n2/3​): this improves over the O~(n/log⁡nTV(D1,D2)2)\tilde{\mathcal{O}}(\frac{n/\log n}{TV(\mathcal{D}_1, \mathcal{D}_2)^{2} })O~(TV(D1​,D2​)2n/logn​) tester of \cite{daskalakis2017optimal} and is optimal up to multiplicative constants. We also establish limitations of sequential algorithms for the problem of testing identity and closeness: they can improve the worst case number of samples by at most a constant factor.

View on arXiv
Comments on this paper