K-Nearest Neighbor Approximation Via the Friend-of-a-Friend Principle

20 August 2019

Abstract

Suppose $V$ is an $n$ -element set where for each $x \in V$ , the elements of $V \setminus \{x\}$ are ranked by their similarity to $x$ . The $K$ -nearest neighbor graph % $G:=(V, E)$ is a directed graph including an arc from each $x$ to the $K$ points of $V \setminus \{x\}$ most similar to $x$ . Constructive approximation to this graph using far fewer than $n^2$ comparisons is important for the analysis of large high-dimensional data sets. \emph{ $K$ -Nearest Neighbor Descent} is a parameter-free heuristic where a sequence of graph approximations is constructed, in which second neighbors in one approximation are proposed as neighbors in the next. We provide a rigorous justification for $O( n \log{n} )$ complexity of a similar algorithm, using range queries, when applied to a homogeneous Poisson process in suitable dimension, but show that the basic algorithm fails to achieve subquadratic complexity on sets whose similarity rankings arise from a "generic" linear order on the $\binom{n}{2}$ inter-point distances in a metric space.

View on arXiv

Comments on this paper