324

Optimal Coreset for Gaussian Kernel Density Estimation

International Symposium on Computational Geometry (SoCG), 2020
Abstract

Given a point set PRdP\subset \mathbb{R}^d, a kernel density estimation for Gaussian kernel is defined as GP(x)=1PpPexp2\overline{\mathcal{G}}_P(x) = \frac{1}{\left|P\right|}\sum_{p\in P}e^{-\left\lVert x-p \right\rVert^2} for any xRdx\in\mathbb{R}^d. We study how to construct a small subset QQ of PP such that the kernel density estimation of PP can be approximated by the kernel density estimation of QQ. This subset QQ is called coreset. The primary technique in this work is to construct ±1\pm 1 coloring on the point set PP by the discrepancy theory and apply this coloring algorithm recursively. Our result leverages Banaszczyk's Theorem. When d>1d>1 is constant, our construction gives a coreset of size O(1ε)O\left(\frac{1}{\varepsilon}\right) as opposed to the best-known result of O(1εlog1ε)O\left(\frac{1}{\varepsilon}\sqrt{\log\frac{1}{\varepsilon}}\right). It is the first to give a breakthrough on the barrier of log\sqrt{\log} factor even when d=2d=2.

View on arXiv
Comments on this paper