20
7

Likelihood-free hypothesis testing

Abstract

Consider the problem of binary hypothesis testing. Given ZZ coming from either Pm\mathbb P^{\otimes m} or Qm\mathbb Q^{\otimes m}, to decide between the two with small probability of error it is sufficient, and in many cases necessary, to have m1/ϵ2m\asymp1/\epsilon^2, where ϵ\epsilon measures the separation between P\mathbb P and Q\mathbb Q in total variation (TV\mathsf{TV}). Achieving this, however, requires complete knowledge of the distributions and can be done, for example, using the Neyman-Pearson test. In this paper we consider a variation of the problem which we call likelihood-free hypothesis testing, where access to P\mathbb P and Q\mathbb Q is given through nn i.i.d. observations from each. In the case when P\mathbb P and Q\mathbb Q are assumed to belong to a non-parametric family, we demonstrate the existence of a fundamental trade-off between nn and mm given by nm\asymp n_\sf{GoF}^2(\epsilon), where n_\sf{GoF}(\epsilon) is the minimax sample complexity of testing between the hypotheses H0:P=QH_0:\, \mathbb P=\mathbb Q vs H1:TV(P,Q)ϵH_1:\, \mathsf{TV}(\mathbb P,\mathbb Q)\geq\epsilon. We show this for three families of distributions, in addition to the family of all discrete distributions for which we obtain a more complicated trade-off exhibiting an additional phase-transition. Our results demonstrate the possibility of testing without fully estimating P\mathbb P and Q\mathbb Q, provided m1/ϵ2m \gg 1/\epsilon^2.

View on arXiv
Comments on this paper