24
51

PrivÍT: Private and Sample Efficient Identity Testing

Abstract

We develop differentially private hypothesis testing methods for the small sample regime. Given a sample D\cal D from a categorical distribution pp over some domain Σ\Sigma, an explicitly described distribution qq over Σ\Sigma, some privacy parameter ε\varepsilon, accuracy parameter α\alpha, and requirements βI\beta_{\rm I} and βII\beta_{\rm II} for the type I and type II errors of our test, the goal is to distinguish between p=qp=q and dTV(p,q)αd_{\rm{TV}}(p,q) \geq \alpha. We provide theoretical bounds for the sample size D|{\cal D}| so that our method both satisfies (ε,0)(\varepsilon,0)-differential privacy, and guarantees βI\beta_{\rm I} and βII\beta_{\rm II} type I and type II errors. We show that differential privacy may come for free in some regimes of parameters, and we always beat the sample complexity resulting from running the χ2\chi^2-test with noisy counts, or standard approaches such as repetition for endowing non-private χ2\chi^2-style statistics with differential privacy guarantees. We experimentally compare the sample complexity of our method to that of recently proposed methods for private hypothesis testing.

View on arXiv
Comments on this paper