99
41

Learning Coverage Functions

Abstract

We study the problem of approximating and learning coverage functions. A function c:2[n]R+c: 2^{[n]} \rightarrow \R^{+} is a coverage function, if there exists a universe UU with non-negative weights w(u)w(u) for each uUu \in U and subsets A1,A2,...,AnA_1, A_2,..., A_n of UU such that c(S)=uiSAiw(u)c(S) = \sum_{u \in \cup_{i \in S} A_i} w(u). Alternatively, coverage functions can be described as non-negative linear combinations of monotone disjunctions. They are a natural subclass of submodular functions and arise in a number of applications. We show that over the uniform distribution coverage functions with range [0,1] are PAC learnable to 1\ell_1-error of \eps\eps in poly(n,1/\eps)poly(n,1/\eps) time and using poly(logn,1/\eps)poly(\log n,1/\eps) random examples. We also show a proper learning algorithm for coverage functions whose running time is polynomial in the size of the universe over which the coverage function is defined. Our algorithm is based on several new structural properties of the Fourier spectrum of coverage functions and, in particular, we prove that any coverage function can be \eps\eps-approximated in 1\ell_1 by a coverage function that depends only on 1/\eps21/\eps^2 variables. In contrast, we show that, without assumptions on the distribution, learning coverage is at least as hard as learning polynomial-size disjoint DNF formulas. Our PAC learning algorithm on the uniform distribution implies the first polynomial-time differentially private algorithm for releasing monotone disjunction queries with low average error over the uniform distribution on disjunctions. This problem was first considered by Gupta et al. (2011) and the best previous algorithm runs in time nO(log(1/α))n^{O(\log(1/\alpha))}, where α\alpha is the accuracy of release (Cheraghchi et al., 2012). Further, our proper learning algorithm implies that the queries can be released using a synthetic database in time poly(n)log(1/α)O(log(1/α))poly(n) \log(1/\alpha)^{O(\log(1/\alpha))}.

View on arXiv
Comments on this paper