82
68

Stability of Density-Based Clustering

Abstract

High density clusters can be characterized by the connected components of a level set L(λ)={x: p(x)>λ}L(\lambda) = \{x:\ p(x)>\lambda\} of the underlying probability density function pp generating the data, at some appropriate level λ0\lambda\geq 0. The complete hierarchical clustering can be characterized by a cluster tree T=λL(λ){\cal T}= \bigcup_{\lambda} L(\lambda). In this paper, we study the behavior of a density level set estimate L^(λ)\widehat L(\lambda) and cluster tree estimate T^\widehat{\cal{T}} based on a kernel density estimator with kernel bandwidth hh. We define two notions of instability to measure the variability of L^(λ)\widehat L(\lambda) and T^\widehat{\cal{T}} as a function of hh, and investigate the theoretical properties of these instability measures.

View on arXiv
Comments on this paper