26
0

TURF: A Two-factor, Universal, Robust, Fast Distribution Learning Algorithm

Abstract

Approximating distributions from their samples is a canonical statistical-learning problem. One of its most powerful and successful modalities approximates every distribution to an 1\ell_1 distance essentially at most a constant times larger than its closest tt-piece degree-dd polynomial, where t1t\ge1 and d0d\ge0. Letting ct,dc_{t,d} denote the smallest such factor, clearly c1,0=1c_{1,0}=1, and it can be shown that ct,d2c_{t,d}\ge 2 for all other tt and dd. Yet current computationally efficient algorithms show only ct,12.25c_{t,1}\le 2.25 and the bound rises quickly to ct,d3c_{t,d}\le 3 for d9d\ge 9. We derive a near-linear-time and essentially sample-optimal estimator that establishes ct,d=2c_{t,d}=2 for all (t,d)(1,0)(t,d)\ne(1,0). Additionally, for many practical distributions, the lowest approximation distance is achieved by polynomials with vastly varying number of pieces. We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation. Experiments combining the two techniques confirm improved performance over existing methodologies.

View on arXiv
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.