Energy Landscape for large average submatrix detection problems in Gaussian random matrices

Combinatorial optimization problems such as finding submatrices with large average value within a large data matrix arise in a wide array of fields, ranging from statistical genetics, bioinformatics, computer science to various social sciences. These techniques play an important role in revealing substructures and associations with interesting characteristics in high dimensional problems. In this paper we analyze asymptotics for such problems in an idealized setting where the underlying matrix is a large Gaussian random matrix and provide detailed asymptotics for various characteristics of the energy landscape for such problems. For fixed we provide a structure theorem for the submatrix with the largest average. We then show that for any given , the size of the largest square sub-matrix with average bigger than satisfies a two point concentration phenomena. Finding such submatrices for a fixed is a computationally intensive problem. We study the natural algorithm that attempts to find submatrices with large average; such algorithms typically converge to a local optimum. We prove a structure theorem for such locally optimal sub-matrices and derive refined asymptotics for the mean and the variance for number of such local optima. In particular for and , the order of the means are and , while the variances are and , respectively, with logarithmic corrections. We develop a new variant of Stein's method to prove a Gaussian Central Limit Theorem for for all finite .
View on arXiv