39
1

A Tight VC-Dimension Analysis of Clustering Coresets with Applications

Abstract

We consider coresets for kk-clustering problems, where the goal is to assign points to centers minimizing powers of distances. A popular example is the kk-median objective pmincCdist(p,C)\sum_{p}\min_{c\in C}dist(p,C). Given a point set PP, a coreset Ω\Omega is a small weighted subset that approximates the cost of PP for all candidate solutions CC up to a (1±ε)(1\pm\varepsilon ) multiplicative factor. In this paper, we give a sharp VC-dimension based analysis for coreset construction. As a consequence, we obtain improved kk-median coreset bounds for the following metrics:Coresets of size O~(kε2)\tilde{O}\left(k\varepsilon^{-2}\right) for shortest path metrics in planar graphs, improving over the bounds O~(kε6)\tilde{O}\left(k\varepsilon^{-6}\right) by [Cohen-Addad, Saulpic, Schwiegelshohn, STOC'21] and O~(k2ε4)\tilde{O}\left(k^2\varepsilon^{-4}\right) by [Braverman, Jiang, Krauthgamer, Wu, SODA'21].Coresets of size O~(kdε2logm)\tilde{O}\left(kd\ell\varepsilon^{-2}\log m\right) for clustering dd-dimensional polygonal curves of length at most mm with curves of length at most \ell with respect to Frechet metrics, improving over the bounds O~(k3dε3logm)\tilde{O}\left(k^3d\ell\varepsilon^{-3}\log m\right) by [Braverman, Cohen-Addad, Jiang, Krauthgamer, Schwiegelshohn, Toftrup, and Wu, FOCS'22] and O~(k2dε2logmlogP)\tilde{O}\left(k^2d\ell\varepsilon^{-2}\log m \log |P|\right) by [Conradi, Kolbe, Psarros, Rohde, SoCG'24].

View on arXiv
Comments on this paper