26
0

Massively Parallel Algorithms for the Stochastic Block Model

Abstract

Learning the community structure of a large-scale graph is a fundamental problem in machine learning, computer science and statistics. We study the problem of exactly recovering the communities in a graph generated from the Stochastic Block Model (SBM) in the Massively Parallel Computation (MPC) model. Specifically, given knkn vertices that are partitioned into kk equal-sized clusters (i.e., each has size nn), a graph on these knkn vertices is randomly generated such that each pair of vertices is connected with probability~pp if they are in the same cluster and with probability qq if not, where p>q>0p > q > 0. We give MPC algorithms for the SBM in the (very general) \emph{ss-space MPC model}, where each machine has memory s=Ω(logn)s=\Omega(\log n). Under the condition that pqpΩ~(k12n12+12(r1))\frac{p-q}{\sqrt{p}}\geq \tilde{\Omega}(k^{\frac12}n^{-\frac12+\frac{1}{2(r-1)}}) for any integer r[3,O(logn)]r\in [3,O(\log n)], our first algorithm exactly recovers all the kk clusters in O(krlogsn)O(kr\log_s n) rounds using O~(m)\tilde{O}(m) total space, or in O(rlogsn)O(r\log_s n) rounds using O~(km)\tilde{O}(km) total space. If pqpΩ~(k34n14)\frac{p-q}{\sqrt{p}}\geq \tilde{\Omega}(k^{\frac34}n^{-\frac14}), our second algorithm achieves O(logsn)O(\log_s n) rounds and O~(m)\tilde{O}(m) total space complexity. Both algorithms significantly improve upon a recent result of Cohen-Addad et al. [PODC'22], who gave algorithms that only work in the \emph{sublinear space MPC model}, where each machine has local memory~s=O(nδ)s=O(n^{\delta}) for some constant δ>0\delta>0, with a much stronger condition on p,q,kp,q,k. Our algorithms are based on collecting the rr-step neighborhood of each vertex and comparing the difference of some statistical information generated from the local neighborhoods for each pair of vertices. To implement the clustering algorithms in parallel, we present efficient approaches for implementing some basic graph operations in the ss-space MPC model.

View on arXiv
Comments on this paper