327

New Nearly-Optimal Coreset for Kernel Density Estimation

International Symposium on Computational Geometry (SoCG), 2020
Abstract

Given a point set PRdP\subset \mathbb{R}^d, kernel density estimation for Gaussian kernel is defined as GP(x)=1PpPexp2\overline{\mathcal{G}}_P(x) = \frac{1}{\left|P\right|}\sum_{p\in P}e^{-\left\lVert x-p \right\rVert^2} for any xRdx\in\mathbb{R}^d. We study how to construct a small subset QQ of PP such that the kernel density estimation of PP can be approximated by the kernel density estimation of QQ. This subset QQ is called \emph{coreset}. The primary technique in this work is to construct ±1\pm 1 coloring on the point set PP by the discrepancy theory and apply this coloring algorithm recursively. Our result leverages Banaszczyk's Theorem. When d>1d>1 is constant, our construction gives a coreset of size O(1εloglog1ε)O\left(\frac{1}{\varepsilon}\sqrt{\log\log\frac{1}{\varepsilon}}\right) as opposed to the best-known result of O(1εlog1ε)O\left(\frac{1}{\varepsilon}\sqrt{\log\frac{1}{\varepsilon}}\right). It is the first to give a breakthrough on the barrier of log\sqrt{\log} factor even when d=2d=2.

View on arXiv
Comments on this paper