ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1902.08472
16
5

Model-based clustering in very high dimensions via adaptive projections

22 February 2019
B. Taschler
F. Dondelinger
S. Mukherjee
    VLM
ArXivPDFHTML
Abstract

Mixture models are a standard approach to dealing with heterogeneous data with non-i.i.d. structure. However, when the dimension ppp is large relative to sample size nnn and where either or both of means and covariances/graphical models may differ between the latent groups, mixture models face statistical and computational difficulties and currently available methods cannot realistically go beyond p ⁣∼ ⁣104p \! \sim \! 10^4p∼104 or so. We propose an approach called Model-based Clustering via Adaptive Projections (MCAP). Instead of estimating mixtures in the original space, we work with a low-dimensional representation obtained by linear projection. The projection dimension itself plays an important role and governs a type of bias-variance tradeoff with respect to recovery of the relevant signals. MCAP sets the projection dimension automatically in a data-adaptive manner, using a proxy for the assignment risk. Combining a full covariance formulation with the adaptive projection allows detection of both mean and covariance signals in very high dimensional problems. We show real-data examples in which covariance signals are reliably detected in problems with p ⁣∼ ⁣104p \! \sim \! 10^4p∼104 or more, and simulations going up to p=106p = 10^6p=106. In some examples, MCAP performs well even when the mean signal is entirely removed, leaving differential covariance structure in the high-dimensional space as the only signal. Across a number of regimes, MCAP performs as well or better than a range of existing methods, including a recently-proposed ℓ1\ell_1ℓ1​-penalized approach; and performance remains broadly stable with increasing dimension. MCAP can be run "out of the box" and is fast enough for interactive use on large-ppp problems using standard desktop computing resources.

View on arXiv
Comments on this paper