Improved MapReduce and Streaming Algorithms for -Center Clustering (with Outliers)

Given a set of points from some metric space and a positive integer , the -center clustering problem requires to identify a subset of centers in minimizing the maximum distance of any point from its closest center. A more general formulation of the problem features a further parameter , and allows up to points of (outliers) to be disregarded when computing the maximum distance from the centers. We present improved, coreset-based 2-round MapReduce algorithms for the above two formulations of the problem, and a 1-pass Streaming algorithm for the case with outliers. For any , the algorithms yield solutions whose approximation ratios are a mere additive term away from those achievable by the best polynomial-time sequential algorithms. The amount of local/working memory required by our algorithms is analyzed in terms of the doubling dimension of the metric space, and the algorithms are guaranteed to be very space efficient for constant . The theoretical results are complemented by a set of experiments which show that our approach yields better quality solutions over the state of the art only at the expense of a mild performance penalty.
View on arXiv