62
20

Fast and Sample Near-Optimal Algorithms for Learning Multidimensional Histograms

Abstract

We study the problem of robustly learning multi-dimensional histograms. A dd-dimensional function h:DRh: D \rightarrow \mathbb{R} is called a kk-histogram if there exists a partition of the domain DRdD \subseteq \mathbb{R}^d into kk axis-aligned rectangles such that hh is constant within each such rectangle. Let f:DRf: D \rightarrow \mathbb{R} be a dd-dimensional probability density function and suppose that ff is OPT\mathrm{OPT}-close, in L1L_1-distance, to an unknown kk-histogram (with unknown partition). Our goal is to output a hypothesis that is O(OPT)+ϵO(\mathrm{OPT}) + \epsilon close to ff, in L1L_1-distance. We give an algorithm for this learning problem that uses n=O~d(k/ϵ2)n = \tilde{O}_d(k/\epsilon^2) samples and runs in time O~d(n)\tilde{O}_d(n). For any fixed dimension, our algorithm has optimal sample complexity, up to logarithmic factors, and runs in near-linear time. Prior to our work, the time complexity of the d=1d=1 case was well-understood, but significant gaps in our understanding remained even for d=2d=2.

View on arXiv
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.