82
136

Performance of Johnson-Lindenstrauss Transform for k-Means and k-Medians Clustering

Abstract

Consider an instance of Euclidean kk-means or kk-medians clustering. We show that the cost of the optimal solution is preserved up to a factor of (1+ε)(1+\varepsilon) under a projection onto a random O(log(k/ε)/ε2)O(\log(k / \varepsilon) / \varepsilon^2)-dimensional subspace. Further, the cost of every clustering is preserved within (1+ε)(1+\varepsilon). More generally, our result applies to any dimension reduction map satisfying a mild sub-Gaussian-tail condition. Our bound on the dimension is nearly optimal. Additionally, our result applies to Euclidean kk-clustering with the distances raised to the pp-th power for any constant pp. For kk-means, our result resolves an open problem posed by Cohen, Elder, Musco, Musco, and Persu (STOC 2015); for kk-medians, it answers a question raised by Kannan.

View on arXiv
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.