39
0

Relax and Merge: A Simple Yet Effective Framework for Solving Fair kk-Means and kk-sparse Wasserstein Barycenter Problems

Abstract

The fairness of clustering algorithms has gained widespread attention across various areas, including machine learning, In this paper, we study fair kk-means clustering in Euclidean space. Given a dataset comprising several groups, the fairness constraint requires that each cluster should contain a proportion of points from each group within specified lower and upper bounds. Due to these fairness constraints, determining the optimal locations of kk centers is a quite challenging task. We propose a novel ``Relax and Merge'' framework that returns a (1+4ρ+O(ϵ))(1+4\rho + O(\epsilon))-approximate solution, where ρ\rho is the approximate ratio of an off-the-shelf vanilla kk-means algorithm and O(ϵ)O(\epsilon) can be an arbitrarily small positive number. If equipped with a PTAS of kk-means, our solution can achieve an approximation ratio of (5+O(ϵ))(5+O(\epsilon)) with only a slight violation of the fairness constraints, which improves the current state-of-the-art approximation guarantee. Furthermore, using our framework, we can also obtain a (1+4ρ+O(ϵ))(1+4\rho +O(\epsilon))-approximate solution for the kk-sparse Wasserstein Barycenter problem, which is a fundamental optimization problem in the field of optimal transport, and a (2+6ρ)(2+6\rho)-approximate solution for the strictly fair kk-means clustering with no violation, both of which are better than the current state-of-the-art methods. In addition, the empirical results demonstrate that our proposed algorithm can significantly outperform baseline approaches in terms of clustering cost.

View on arXiv
Comments on this paper