Dropping Convexity for Faster Semi-definite Optimization

14 September 2015

Srinadh Bhojanapalli

Anastasios Kyrillidis and

Sujay Sanghavi

Abstract

A matrix $X \in \mathbb{R}^{n \times n}$ is positive semi-definite (PSD) if and only if it can be written as the product $UU^\top$ , for some matrix $U$ . This paper explores the use of this observation for optimization: specifically, we consider the minimization of a convex function $f$ over the positive semi-definite cone $X \succeq 0$ , but via gradient descent on $f(UU^\top)$ , which is a non-convex function of $U$ . We focus on the (empirically quite popular) approach where, for computational or statistical reasons, $U$ is set to be an $n\times r$ matrix for some $r \leq n$ , and correspondingly $f$ satisfies restricted strong convexity (setting $r=n$ recovers the exact case with global strong convexity). We develop a special choice of step size, and show that updating $U$ via gradient descent with this choice results in linear convergence to the top- $r$ components of the optimum of $f$ , provided we start from a point which has constant relative distance to the optimum. We also develop an initialization scheme for the "first-order oracle" setting, i.e. when our only access to the function is via its value and gradients at specific points.

View on arXiv

Comments on this paper