ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.04712
23
14

Randomized Near Neighbor Graphs, Giant Components, and Applications in Data Science

13 November 2017
G. Linderman
Zhengchao Wan
Y. Kluger
Stefan Steinerberger
ArXivPDFHTML
Abstract

If we pick nnn random points uniformly in [0,1]d[0,1]^d[0,1]d and connect each point to its k−k-k−nearest neighbors, then it is well known that there exists a giant connected component with high probability. We prove that in [0,1]d[0,1]^d[0,1]d it suffices to connect every point to cd,1log⁡log⁡n c_{d,1} \log{\log{n}}cd,1​loglogn points chosen randomly among its cd,2log⁡n− c_{d,2} \log{n}-cd,2​logn−nearest neighbors to ensure a giant component of size n−o(n)n - o(n)n−o(n) with high probability. This construction yields a much sparser random graph with ∼nlog⁡log⁡n\sim n \log\log{n}∼nloglogn instead of ∼nlog⁡n\sim n \log{n}∼nlogn edges that has comparable connectivity properties. This result has nontrivial implications for problems in data science where an affinity matrix is constructed: instead of picking the k−k-k−nearest neighbors, one can often pick k′≪kk' \ll kk′≪k random points out of the k−k-k−nearest neighbors without sacrificing efficiency. This can massively simplify and accelerate computation, we illustrate this with several numerical examples.

View on arXiv
Comments on this paper