73
173

Dropping Convexity for Faster Semi-definite Optimization

Srinadh Bhojanapalli
Anastasios Kyrillidis and
Sujay Sanghavi
Abstract

A matrix XRn×nX \in \mathbb{R}^{n \times n} is positive semi-definite (PSD) if and only if it can be written as the product UUUU^\top, for some matrix UU. This paper explores the use of this observation for optimization: specifically, we consider the minimization of a convex function ff over the positive semi-definite cone X0X \succeq 0, but via gradient descent on f(UU)f(UU^\top), which is a non-convex function of UU. We focus on the (empirically quite popular) approach where, for computational or statistical reasons, UU is set to be an n×rn\times r matrix for some rnr \leq n, and correspondingly ff satisfies restricted strong convexity (setting r=nr=n recovers the exact case with global strong convexity). We develop a special choice of step size, and show that updating UU via gradient descent with this choice results in linear convergence to the top-rr components of the optimum of ff, provided we start from a point which has constant relative distance to the optimum. We also develop an initialization scheme for the "first-order oracle" setting, i.e. when our only access to the function is via its value and gradients at specific points.

View on arXiv
Comments on this paper