21
23

Nearly-Tight and Oblivious Algorithms for Explainable Clustering

Abstract

We study the problem of explainable clustering in the setting first formalized by Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML 2020). A kk-clustering is said to be explainable if it is given by a decision tree where each internal node splits data points with a threshold cut in a single dimension (feature), and each of the kk leaves corresponds to a cluster. We give an algorithm that outputs an explainable clustering that loses at most a factor of O(log2k)O(\log^2 k) compared to an optimal (not necessarily explainable) clustering for the kk-medians objective, and a factor of O(klog2k)O(k \log^2 k) for the kk-means objective. This improves over the previous best upper bounds of O(k)O(k) and O(k2)O(k^2), respectively, and nearly matches the previous Ω(logk)\Omega(\log k) lower bound for kk-medians and our new Ω(k)\Omega(k) lower bound for kk-means. The algorithm is remarkably simple. In particular, given an initial not necessarily explainable clustering in Rd\mathbb{R}^d, it is oblivious to the data points and runs in time O(dklog2k)O(dk \log^2 k), independent of the number of data points nn. Our upper and lower bounds also generalize to objectives given by higher p\ell_p-norms.

View on arXiv
Comments on this paper