59
65

Improved MapReduce and Streaming Algorithms for kk-Center Clustering (with Outliers)

Abstract

Given a set SS of points from some metric space and a positive integer k<Sk<|S|, the kk-center clustering problem requires to identify a subset of kk centers in SS minimizing the maximum distance of any point from its closest center. A more general formulation of the problem features a further parameter zz, and allows up to zz points of SS (outliers) to be disregarded when computing the maximum distance from the centers. We present improved, coreset-based 2-round MapReduce algorithms for the above two formulations of the problem, and a 1-pass Streaming algorithm for the case with outliers. For any ϵ>0\epsilon>0, the algorithms yield solutions whose approximation ratios are a mere additive term ϵ\epsilon away from those achievable by the best polynomial-time sequential algorithms. The amount of local/working memory required by our algorithms is analyzed in terms of the doubling dimension DD of the metric space, and the algorithms are guaranteed to be very space efficient for constant DD. The theoretical results are complemented by a set of experiments which show that our approach yields better quality solutions over the state of the art only at the expense of a mild performance penalty.

View on arXiv
Comments on this paper