102
73

Testing probability distributions using conditional samples

Abstract

We study a new framework for property testing of probability distributions, by considering distribution testing algorithms that have access to a conditional sampling oracle.* This is an oracle that takes as input a subset S[N]S \subseteq [N] of the domain [N][N] of the unknown probability distribution \D\D and returns a draw from the conditional probability distribution \D\D restricted to SS. This new model allows considerable flexibility in the design of distribution testing algorithms; in particular, testing algorithms in this model can be adaptive. We study a wide range of natural distribution testing problems in this new framework and some of its variants, giving both upper and lower bounds on query complexity. These problems include testing whether \D\D is the uniform distribution \calU\calU; testing whether \D=\D\D = \D^\ast for an explicitly provided \D\D^\ast; testing whether two unknown distributions \D1\D_1 and \D2\D_2 are equivalent; and estimating the variation distance between \D\D and the uniform distribution. At a high level our main finding is that the new "conditional sampling" framework we consider is a powerful one: while all the problems mentioned above have Ω(N)\Omega(\sqrt{N}) sample complexity in the standard model (and in some cases the complexity must be almost linear in NN), we give \poly(logN,1/\eps)\poly(\log N, 1/\eps)-query algorithms (and in some cases \poly(1/\eps)\poly(1/\eps)-query algorithms independent of NN) for all these problems in our conditional sampling setting. *Independently from our work, Chakraborty et al. also considered this framework. We discuss their work in Subsection [1.4].

View on arXiv
Comments on this paper