49
0

Tree-Guided L1L_1-Convex Clustering

Abstract

Convex clustering is a modern clustering framework that guarantees globally optimal solutions and performs comparably to other advanced clustering methods. However, obtaining a complete dendrogram (clusterpath) for large-scale datasets remains computationally challenging due to the extensive costs associated with iterative optimization approaches. To address this limitation, we develop a novel convex clustering algorithm called Tree-Guided L1L_1-Convex Clustering (TGCC). We first focus on the fact that the loss function of L1L_1-convex clustering with tree-structured weights can be efficiently optimized using a dynamic programming approach. We then develop an efficient cluster fusion algorithm that utilizes the tree structure of the weights to accelerate the optimization process and eliminate the issue of cluster splits commonly observed in convex clustering. By combining the dynamic programming approach with the cluster fusion algorithm, the TGCC algorithm achieves superior computational efficiency without sacrificing clustering performance. Remarkably, our TGCC algorithm can construct a complete clusterpath for 10610^6 points in R2\mathbb{R}^2 within 15 seconds on a standard laptop without the need for parallel or distributed computing frameworks. Moreover, we extend the TGCC algorithm to develop biclustering and sparse convex clustering algorithms.

View on arXiv
@article{zhang2025_2503.24012,
  title={ Tree-Guided $L_1$-Convex Clustering },
  author={ Bingyuan Zhang and Yoshikazu Terada },
  journal={arXiv preprint arXiv:2503.24012},
  year={ 2025 }
}
Comments on this paper