ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.03431
46
13
v1v2 (latest)

Can clustering scale sublinearly with its clusters? A variational EM acceleration of GMMs and kkk-means

9 November 2017
D. Forster
Jörg Lücke
ArXiv (abs)PDFHTML
Abstract

One iteration of kkk-means or EM for Gaussian mixture models (GMMs) scales linearly with the number of data points NNN, the number of clusters CCC, and the data dimensionality DDD. In this study, we explore whether one iteration of kkk-means or EM for GMMs can scale sublinearly with CCC at run-time, while the increase of the clustering objective remains effective. The tool we apply for complexity reduction is variational EM, which is typically applied to make training of generative models with exponentially many hidden states tractable. Here, we apply novel theoretical results on truncated variational EM to make tractable clustering algorithms more efficient. The basic idea is the use of a partial variational E-step which reduces the linear complexity of O(NCD)\mathcal{O}(NCD)O(NCD) required for a full E-step to a sublinear complexity. Our main observation is that the linear dependency on CCC can be reduced to a dependency on a much smaller parameter GGG, related to the cluster neighborhood relationship. We focus on two versions of partial variational EM for clustering: variational GMM, scaling with O(NG2D)\mathcal{O}(NG^2D)O(NG2D), and variational kkk-means, scaling with O(NGD)\mathcal{O}(NGD)O(NGD) per iteration. Empirical results then show that these algorithms still require comparable numbers of iterations to increase the clustering objective to the same values as kkk-means. For data with many clusters, we consequently observe reductions of the net computational demands between two and three orders of magnitude. More generally, our results provide substantial empirical evidence in favor of clustering to scale sublinearly with CCC.

View on arXiv
Comments on this paper