16
48

Optimal Algorithms and Lower Bounds for Testing Closeness of Structured Distributions

Abstract

We give a general unified method that can be used for L1L_1 {\em closeness testing} of a wide range of univariate structured distribution families. More specifically, we design a sample optimal and computationally efficient algorithm for testing the equivalence of two unknown (potentially arbitrary) univariate distributions under the Ak\mathcal{A}_k-distance metric: Given sample access to distributions with density functions p,q:IRp, q: I \to \mathbb{R}, we want to distinguish between the cases that p=qp=q and pqAkϵ\|p-q\|_{\mathcal{A}_k} \ge \epsilon with probability at least 2/32/3. We show that for any k2,ϵ>0k \ge 2, \epsilon>0, the {\em optimal} sample complexity of the Ak\mathcal{A}_k-closeness testing problem is Θ(max{k4/5/ϵ6/5,k1/2/ϵ2})\Theta(\max\{ k^{4/5}/\epsilon^{6/5}, k^{1/2}/\epsilon^2 \}). This is the first o(k)o(k) sample algorithm for this problem, and yields new, simple L1L_1 closeness testers, in most cases with optimal sample complexity, for broad classes of structured distributions.

View on arXiv
Comments on this paper