88

Which Sampling Densities are Suitable for Spectral Clustering on Unbounded Domains?

Abstract

We consider a random geometric graph with vertices sampled from a probability measure supported on Rd\mathbb R^d, and study its connectivity. We show the graph is typically disconnected, unless the sampling density has superexponential decay. In the later setting, we identify an asymptotic threshold value for the radius parameter of the graph such that, for radius values beyond the threshold, some concentration properties hold for the sampled points of the graph, while the graph is disconnected for radius values below the same threshold. Properties of point processes are well-known to be closely related to the analysis of geometric learning problems, such as spectral clustering. This work can be seen as a first step towards understanding the consistency of spectral clustering when the probability measure has unbounded support. In particular, we narrow down the setting under which spectral clustering algorithms on Rd\mathbb R^d may be expected to achieve consistency, to a sufficiently fast decay of the sampling density (superexponential) and a sufficiently slowly decaying radius parameter value as a function of nn, the number of sampled points.

View on arXiv
Comments on this paper