Estimating the number of communities is one of the fundamental problems in the stochastic block model. We re-examine the Bayesian paradigm for stochastic block models and propose a "corrected Bayesian information criterion", to decide the block number and show that the produced estimator is consistent. The novel penalty function improves those used in Wang and Bickel (2016) and Saldana, Yu and Feng (2016) which tend to underestimate and overestimate the block number, respectively. Along the way, we establish the Wilks theorem for the stochastic block model. Our results show that, to gain the consistency of model selection for stochastic block models, we need a so called "consistency condition". For a homogeneous network, this condition requires that p/q is sufficiently large, where p and q are the within community connection probability and the between community connection probability, respectively. Our analysis can also be extended to the degree corrected stochastic block model. Numerical studies demonstrate our theoretical results.
View on arXiv