Performance of Johnson-Lindenstrauss Transform for k-Means and k-Medians Clustering

Consider an instance of Euclidean -means or -medians clustering. We show that the cost of the optimal solution is preserved up to a factor of under a projection onto a random -dimensional subspace. Further, the cost of every clustering is preserved within . More generally, our result applies to any dimension reduction map satisfying a mild sub-Gaussian-tail condition. Our bound on the dimension is nearly optimal. Additionally, our result applies to Euclidean -clustering with the distances raised to the -th power for any constant . For -means, our result resolves an open problem posed by Cohen, Elder, Musco, Musco, and Persu (STOC 2015); for -medians, it answers a question raised by Kannan.
View on arXiv