12
14

Spectral Recovery of Binary Censored Block Models

Abstract

Community detection is the problem of identifying community structure in graphs. Often the graph is modeled as a sample from the Stochastic Block Model, in which each vertex belongs to a community. The probability that two vertices are connected by an edge depends on the communities of those vertices. In this paper, we consider a model of {\em censored} community detection with two communities, where most of the data is missing as the status of only a small fraction of the potential edges is revealed. In this model, vertices in the same community are connected with probability pp while vertices in opposite communities are connected with probability qq. The connectivity status of a given pair of vertices {u,v}\{u,v\} is revealed with probability α\alpha, independently across all pairs, where α=tlog(n)n\alpha = \frac{t \log(n)}{n}. We establish the information-theoretic threshold tc(p,q)t_c(p,q), such that no algorithm succeeds in recovering the communities exactly when t<tc(p,q)t < t_c(p,q). We show that when t>tc(p,q)t > t_c(p,q), a simple spectral algorithm based on a weighted, signed adjacency matrix succeeds in recovering the communities exactly. While spectral algorithms are shown to have near-optimal performance in the symmetric case, we show that they may fail in the asymmetric case where the connection probabilities inside the two communities are allowed to be different. In particular, we show the existence of a parameter regime where a simple two-phase algorithm succeeds but any algorithm based on the top two eigenvectors of the weighted, signed adjacency matrix fails.

View on arXiv
Comments on this paper