All Papers
Title |
|---|
Title |
|---|

This paper shows how to adapt several simple and classical sampling-based algorithms for the -means problem to the setting with outliers. Recently, Bhaskara et al. (NeurIPS 2019) showed how to adapt the classical -means++ algorithm to the setting with outliers. However, their algorithm needs to output outliers, where is the number of true outliers, to match the -approximation guarantee of -means++. In this paper, we build on their ideas and show how to adapt several sequential and distributed -means algorithms to the setting with outliers, but with substantially stronger theoretical guarantees: our algorithms output outliers while achieving an -approximation to the objective function. In the sequential world, we achieve this by adapting a recent algorithm of Lattanzi and Sohler (ICML 2019). In the distributed setting, we adapt a simple algorithm of Guha et al. (IEEE Trans. Know. and Data Engineering 2003) and the popular -means of Bahmani et al. (PVLDB 2012). A theoretical application of our techniques is an algorithm with running time that achieves an -approximation to the objective function while outputting outliers, assuming . This is complemented with a matching lower bound of for this problem in the oracle model.
View on arXiv