Dropping Convexity for Faster Semi-definite Optimization

A matrix is positive semi-definite (PSD) if and only if it can be written as the product , for some matrix . This paper explores the use of this observation for optimization: specifically, we consider the minimization of a convex function over the positive semi-definite cone , but via gradient descent on , which is a non-convex function of . We focus on the (empirically quite popular) approach where, for computational or statistical reasons, is set to be an matrix for some , and correspondingly satisfies restricted strong convexity (setting recovers the exact case with global strong convexity). We develop a special choice of step size, and show that updating via gradient descent with this choice results in linear convergence to the top- components of the optimum of , provided we start from a point which has constant relative distance to the optimum. We also develop an initialization scheme for the "first-order oracle" setting, i.e. when our only access to the function is via its value and gradients at specific points.
View on arXiv