80
32

Representation, Approximation and Learning of Submodular Functions Using Low-rank Decision Trees

Abstract

We study the complexity of approximate representation and learning of submodular functions over the uniform distribution on the Boolean hypercube {0,1}n\{0,1\}^n. Our main result is the following structural theorem: any submodular function is ϵ\epsilon-close in 2\ell_2 to a real-valued decision tree (DT) of depth O(1/ϵ2)O(1/\epsilon^2). This immediately implies that any submodular function is ϵ\epsilon-close to a function of at most 2O(1/ϵ2)2^{O(1/\epsilon^2)} variables and has a spectral 1\ell_1 norm of 2O(1/ϵ2)2^{O(1/\epsilon^2)}. It also implies the closest previous result that states that submodular functions can be approximated by polynomials of degree O(1/ϵ2)O(1/\epsilon^2) (Cheraghchi et al., 2012). Our result is proved by constructing an approximation of a submodular function by a DT of rank 4/ϵ24/\epsilon^2 and a proof that any rank-rr DT can be ϵ\epsilon-approximated by a DT of depth 52(r+log(1/ϵ))\frac{5}{2}(r+\log(1/\epsilon)). We show that these structural results can be exploited to give an attribute-efficient PAC learning algorithm for submodular functions running in time O~(n2)2O(1/ϵ4)\tilde{O}(n^2) \cdot 2^{O(1/\epsilon^{4})}. The best previous algorithm for the problem requires nO(1/ϵ2)n^{O(1/\epsilon^{2})} time and examples (Cheraghchi et al., 2012) but works also in the agnostic setting. In addition, we give improved learning algorithms for a number of related settings. We also prove that our PAC and agnostic learning algorithms are essentially optimal via two lower bounds: (1) an information-theoretic lower bound of 2Ω(1/ϵ2/3)2^{\Omega(1/\epsilon^{2/3})} on the complexity of learning monotone submodular functions in any reasonable model; (2) computational lower bound of nΩ(1/ϵ2/3)n^{\Omega(1/\epsilon^{2/3})} based on a reduction to learning of sparse parities with noise, widely-believed to be intractable. These are the first lower bounds for learning of submodular functions over the uniform distribution.

View on arXiv
Comments on this paper