19
30

Near-Optimal Closeness Testing of Discrete Histogram Distributions

Abstract

We investigate the problem of testing the equivalence between two discrete histograms. A {\em kk-histogram} over [n][n] is a probability distribution that is piecewise constant over some set of kk intervals over [n][n]. Histograms have been extensively studied in computer science and statistics. Given a set of samples from two kk-histogram distributions p,qp, q over [n][n], we want to distinguish (with high probability) between the cases that p=qp = q and pq1ϵ\|p-q\|_1 \geq \epsilon. The main contribution of this paper is a new algorithm for this testing problem and a nearly matching information-theoretic lower bound. Specifically, the sample complexity of our algorithm matches our lower bound up to a logarithmic factor, improving on previous work by polynomial factors in the relevant parameters. Our algorithmic approach applies in a more general setting and yields improved sample upper bounds for testing closeness of other structured distributions as well.

View on arXiv
Comments on this paper