Fast and Sample Near-Optimal Algorithms for Learning Multidimensional Histograms

We study the problem of robustly learning multi-dimensional histograms. A -dimensional function is called a -histogram if there exists a partition of the domain into axis-aligned rectangles such that is constant within each such rectangle. Let be a -dimensional probability density function and suppose that is -close, in -distance, to an unknown -histogram (with unknown partition). Our goal is to output a hypothesis that is close to , in -distance. We give an algorithm for this learning problem that uses samples and runs in time . For any fixed dimension, our algorithm has optimal sample complexity, up to logarithmic factors, and runs in near-linear time. Prior to our work, the time complexity of the case was well-understood, but significant gaps in our understanding remained even for .
View on arXiv