72
20

Energy Landscape for large average submatrix detection problems in Gaussian random matrices

Abstract

Combinatorial optimization problems such as finding submatrices with large average value within a large data matrix arise in a wide array of fields, ranging from statistical genetics, bioinformatics, computer science to various social sciences. These techniques play an important role in revealing substructures and associations with interesting characteristics in high dimensional problems. In this paper we analyze asymptotics for such problems in an idealized setting where the underlying matrix is a large Gaussian random matrix and provide detailed asymptotics for various characteristics of the energy landscape for such problems. For fixed kk we provide a structure theorem for the k×kk\times k submatrix with the largest average. We then show that for any given >>0\gt > 0, the size of the largest square sub-matrix with average bigger than >\gt satisfies a two point concentration phenomena. Finding such submatrices for a fixed kk is a computationally intensive problem. We study the natural algorithm that attempts to find submatrices with large average; such algorithms typically converge to a local optimum. We prove a structure theorem for such locally optimal sub-matrices and derive refined asymptotics for the mean and the variance for Ln(k):=L_n(k):= number of such local optima. In particular for k=2k=2 and k=3k=3, the order of the means are n2n^2 and n3n^3, while the variances are n8/3n^{8/3} and n9/2n^{9/2}, respectively, with logarithmic corrections. We develop a new variant of Stein's method to prove a Gaussian Central Limit Theorem for Ln(k)L_n(k) for all finite kk.

View on arXiv
Comments on this paper