21
0

Bagged kk-Distance for Mode-Based Clustering Using the Probability of Localized Level Sets

Abstract

In this paper, we propose an ensemble learning algorithm named \textit{bagged kk-distance for mode-based clustering} (\textit{BDMBC}) by putting forward a new measurement called the \textit{probability of localized level sets} (\textit{PLLS}), which enables us to find all clusters for varying densities with a global threshold. On the theoretical side, we show that with a properly chosen number of nearest neighbors kDk_D in the bagged kk-distance, the sub-sample size ss, the bagging rounds BB, and the number of nearest neighbors kLk_L for the localized level sets, BDMBC can achieve optimal convergence rates for mode estimation. It turns out that with a relatively small BB, the sub-sample size ss can be much smaller than the number of training data nn at each bagging round, and the number of nearest neighbors kDk_D can be reduced simultaneously. Moreover, we establish optimal convergence results for the level set estimation of the PLLS in terms of Hausdorff distance, which reveals that BDMBC can find localized level sets for varying densities and thus enjoys local adaptivity. On the practical side, we conduct numerical experiments to empirically verify the effectiveness of BDMBC for mode estimation and level set estimation, which demonstrates the promising accuracy and efficiency of our proposed algorithm.

View on arXiv
Comments on this paper