Convergence of Alternating Gradient Descent for Matrix Factorization

We consider alternating gradient descent (AGD) with fixed step size , applied to the asymmetric matrix factorization objective. We show that, for a rank- matrix , iterations of alternating gradient descent suffice to reach an -optimal factorization with high probability starting from an atypical random initialization. The factors have rank so that and . Experiments suggest that our proposed initialization is not merely of theoretical benefit, but rather significantly improves convergence of gradient descent in practice. Our proof is conceptually simple: a uniform PL-inequality and uniform Lipschitz smoothness constant are guaranteed for a sufficient number of iterations, starting from our random initialization. Our proof method should be useful for extending and simplifying convergence analyses for a broader class of nonconvex low-rank factorization problems.
View on arXiv