ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.00530
54
1
v1v2 (latest)

Massively Parallel Algorithms for the Stochastic Block Model

2 July 2023
Zelin Li
Pan Peng
Xianbin Zhu
ArXiv (abs)PDFHTML
Abstract

Learning the community structure of a large-scale graph is a fundamental problem in machine learning, computer science and statistics. We study the problem of exactly recovering the communities in a graph generated from the Stochastic Block Model (SBM) in the Massively Parallel Computation (MPC) model. Specifically, given knknkn vertices that are partitioned into kkk equal-sized clusters (i.e., each has size nnn), a graph on these knknkn vertices is randomly generated such that each pair of vertices is connected with probability~ppp if they are in the same cluster and with probability qqq if not, where p>q>0p > q > 0p>q>0. We give MPC algorithms for the SBM in the (very general) \emph{sss-space MPC model}, where each machine has memory s=Ω(log⁡n)s=\Omega(\log n)s=Ω(logn). Under the condition that p−qp≥Ω~(k12n−12+12(r−1))\frac{p-q}{\sqrt{p}}\geq \tilde{\Omega}(k^{\frac12}n^{-\frac12+\frac{1}{2(r-1)}})p​p−q​≥Ω~(k21​n−21​+2(r−1)1​) for any integer r∈[3,O(log⁡n)]r\in [3,O(\log n)]r∈[3,O(logn)], our first algorithm exactly recovers all the kkk clusters in O(krlog⁡sn)O(kr\log_s n)O(krlogs​n) rounds using O~(m)\tilde{O}(m)O~(m) total space, or in O(rlog⁡sn)O(r\log_s n)O(rlogs​n) rounds using O~(km)\tilde{O}(km)O~(km) total space. If p−qp≥Ω~(k34n−14)\frac{p-q}{\sqrt{p}}\geq \tilde{\Omega}(k^{\frac34}n^{-\frac14})p​p−q​≥Ω~(k43​n−41​), our second algorithm achieves O(log⁡sn)O(\log_s n)O(logs​n) rounds and O~(m)\tilde{O}(m)O~(m) total space complexity. Both algorithms significantly improve upon a recent result of Cohen-Addad et al. [PODC'22], who gave algorithms that only work in the \emph{sublinear space MPC model}, where each machine has local memory~s=O(nδ)s=O(n^{\delta})s=O(nδ) for some constant δ>0\delta>0δ>0, with a much stronger condition on p,q,kp,q,kp,q,k. Our algorithms are based on collecting the rrr-step neighborhood of each vertex and comparing the difference of some statistical information generated from the local neighborhoods for each pair of vertices. To implement the clustering algorithms in parallel, we present efficient approaches for implementing some basic graph operations in the sss-space MPC model.

View on arXiv
Comments on this paper