Properly Learning Poisson Binomial Distributions in Almost Polynomial Time

We give an algorithm for properly learning Poisson binomial distributions. A Poisson binomial distribution (PBD) of order is the discrete probability distribution of the sum of mutually independent Bernoulli random variables. Given samples from an unknown PBD , our algorithm runs in time , and outputs a hypothesis PBD that is -close to in total variation distance. The previously best known running time for properly learning PBDs was . As one of our main contributions, we provide a novel structural characterization of PBDs. We prove that, for all there exists an explicit collection of vectors of multiplicities, such that for any PBD there exists a PBD with distinct parameters whose multiplicities are given by some element of , such that is -close to . Our proof combines tools from Fourier analysis and algebraic geometry. Our approach to the proper learning problem is as follows: Starting with an accurate non-proper hypothesis, we fit a PBD to this hypothesis. More specifically, we essentially start with the hypothesis computed by the computationally efficient non-proper learning algorithm in our recent work~\cite{DKS15}. Our aforementioned structural characterization allows us to reduce the corresponding fitting problem to a collection of systems of low-degree polynomial inequalities. We show that each such system can be solved in time , which yields the overall running time of our algorithm.
View on arXiv