64
42

Sharp optimal recovery in the two-component Gaussian Mixture Model

Abstract

This paper studies the problem of clustering in the two-component Gaussian mixture model where the centers are separated by 2Δ2\Delta for some Δ>0\Delta>0. We characterize the exact phase transition threshold, given by \bar{\Delta}_n^{2} = \sigma^{2}\left(1 + \sqrt{1+\frac{2p}{n\log{n}}} \right)\log{n}, such that perfect recovery of the communities is possible with high probability if Δ(1+ε)Δˉn\Delta\ge(1+\varepsilon)\bar \Delta_n, and impossible if Δ(1ε)Δˉn\Delta\le (1-\varepsilon)\bar \Delta_n for any constant ε>0\varepsilon>0. This implies an elbow effect at a critical dimension p=nlognp^{*}=n\log{n}. We present a non-asymptotic lower bound for the corresponding minimax Hamming risk improving on existing results. It is, to our knowledge, the first lower bound capturing the right dependence on pp. We also propose an optimal, efficient and adaptive procedure that is minimax rate optimal. The rate optimality is moreover sharp in the asymptotics when the sample size goes to infinity. Our procedure is based on a variant of Lloyd's iterations initialized by a spectral method; a popular clustering algorithm widely used by practitioners. Numerical studies confirm our theoretical findings.

View on arXiv
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.