A statistical interpretation of spectral embedding: the generalised random dot product graph

A generalisation of a latent position network model known as the random dot product graph is considered. We show that, whether the normalised Laplacian or adjacency matrix is used, the vector representations of nodes obtained by spectral embedding, using the largest eigenvalues by magnitude, provide strongly consistent latent position estimates with asymptotically Gaussian error, up to indefinite orthogonal transformation. The mixed membership and standard stochastic block models constitute special cases where the latent positions live respectively inside or on the vertices of a simplex, crucially, without assuming the underlying block connectivity probability matrix is positive-definite. Estimation via spectral embedding can therefore be achieved by respectively estimating this simplicial support, or fitting a Gaussian mixture model. In the latter case, the use of -means (with Euclidean distance), as has been previously recommended, is suboptimal and for identifiability reasons unsound. Indeed, Euclidean distances and angles are not preserved under indefinite orthogonal transformation, and we show stochastic block model examples where such quantities vary appreciably. Empirical improvements in link prediction (over the random dot product graph), as well as the potential to uncover richer latent structure (than posited under the mixed membership or standard stochastic block models) are demonstrated in a cyber-security example.
View on arXiv