85
172

Random Projections for kk-means Clustering

Abstract

This paper discusses the topic of dimensionality reduction for kk-means clustering. We prove that any set of nn points in dd dimensions (rows in a matrix A\RRn×dA \in \RR^{n \times d}) can be projected into t=Ω(k/\eps2)t = \Omega(k / \eps^2) dimensions, for any \eps(0,1/3)\eps \in (0,1/3), in O(nd\eps2k/log(d))O(n d \lceil \eps^{-2} k/ \log(d) \rceil ) time, such that with constant probability the optimal kk-partition of the point set is preserved within a factor of 2+\eps2+\eps. The projection is done by post-multiplying AA with a d×td \times t random matrix RR having entries +1/t+1/\sqrt{t} or 1/t-1/\sqrt{t} with equal probability. A numerical implementation of our technique and experiments on a large face images dataset verify the speed and the accuracy of our theoretical results.

View on arXiv
Comments on this paper