ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.22991
55
0

Number of Clusters in a Dataset: A Regularized K-means Approach

29 May 2025
B. Kamgar-Parsi
B. Kamgar-Parsi
ArXiv (abs)PDFHTML
Main:13 Pages
17 Figures
Bibliography:2 Pages
Appendix:4 Pages
Abstract

Finding the number of meaningful clusters in an unlabeled dataset is important in many applications. Regularized k-means algorithm is a possible approach frequently used to find the correct number of distinct clusters in datasets. The most common formulation of the regularization function is the additive linear term λk\lambda kλk, where kkk is the number of clusters and λ\lambdaλ a positive coefficient. Currently, there are no principled guidelines for setting a value for the critical hyperparameter λ\lambdaλ. In this paper, we derive rigorous bounds for λ\lambdaλ assuming clusters are {\em ideal}. Ideal clusters (defined as ddd-dimensional spheres with identical radii) are close proxies for k-means clusters (ddd-dimensional spherically symmetric distributions with identical standard deviations). Experiments show that the k-means algorithm with additive regularizer often yields multiple solutions. Thus, we also analyze k-means algorithm with multiplicative regularizer. The consensus among k-means solutions with additive and multiplicative regularizations reduces the ambiguity of multiple solutions in certain cases. We also present selected experiments that demonstrate performance of the regularized k-means algorithms as clusters deviate from the ideal assumption.

View on arXiv
Comments on this paper