This paper studies the -means++ algorithm for clustering as well as the class of sampling algorithms to which -means++ belongs. It is shown that for any constant factor , selecting cluster centers by sampling yields a constant-factor approximation to the optimal clustering with centers, in expectation and without conditions on the dataset. This result extends the previously known guarantee for the case to the constant-factor bi-criteria regime. It also improves upon an existing constant-factor bi-criteria result that holds only with constant probability.
View on arXiv