In data summarization we want to choose prototypes in order to summarize a data set. We study a setting where the data set comprises several demographic groups and we are restricted to choose prototypes belonging to group . A common approach to the problem without the fairness constraint is to optimize a centroid-based clustering objective such as -center. A natural extension then is to incorporate the fairness constraint into the clustering problem. Existing algorithms for doing so run in time super-quadratic in the size of the data set, which is in contrast to the standard -center problem being approximable in linear time. In this paper, we resolve this gap by providing a simple approximation algorithm for the -center problem under the fairness constraint with running time linear in the size of the data set and . If the number of demographic groups is small, the approximation guarantee of our algorithm only incurs a constant-factor overhead.
View on arXiv