ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.07852
33
29

Distributed kkk-Clustering for Data with Heavy Noise

18 October 2018
Xiangyu Guo
Shi Li
ArXivPDFHTML
Abstract

In this paper, we consider the kkk-center/median/means clustering with outliers problems (or the (k,z)(k, z)(k,z)-center/median/means problems) in the distributed setting. Most previous distributed algorithms have their communication costs linearly depending on zzz, the number of outliers. Recently Guha et al. overcame this dependence issue by considering bi-criteria approximation algorithms that output solutions with 2z2z2z outliers. For the case where zzz is large, the extra zzz outliers discarded by the algorithms might be too large, considering that the data gathering process might be costly. In this paper, we improve the number of outliers to the best possible (1+ϵ)z(1+\epsilon)z(1+ϵ)z, while maintaining the O(1)O(1)O(1)-approximation ratio and independence of communication cost on zzz. The problems we consider include the (k,z)(k, z)(k,z)-center problem, and (k,z)(k, z)(k,z)-median/means problems in Euclidean metrics. Implementation of the our algorithm for (k,z)(k, z)(k,z)-center shows that it outperforms many previous algorithms, both in terms of the communication cost and quality of the output solution.

View on arXiv
Comments on this paper