20
2

Decomposable Non-Smooth Convex Optimization with Nearly-Linear Gradient Oracle Complexity

Abstract

Many fundamental problems in machine learning can be formulated by the convex program \[ \min_{\theta\in R^d}\ \sum_{i=1}^{n}f_{i}(\theta), \] where each fif_i is a convex, Lipschitz function supported on a subset of did_i coordinates of θ\theta. One common approach to this problem, exemplified by stochastic gradient descent, involves sampling one fif_i term at every iteration to make progress. This approach crucially relies on a notion of uniformity across the fif_i's, formally captured by their condition number. In this work, we give an algorithm that minimizes the above convex formulation to ϵ\epsilon-accuracy in O~(i=1ndilog(1/ϵ))\widetilde{O}(\sum_{i=1}^n d_i \log (1 /\epsilon)) gradient computations, with no assumptions on the condition number. The previous best algorithm independent of the condition number is the standard cutting plane method, which requires O(ndlog(1/ϵ))O(nd \log (1/\epsilon)) gradient computations. As a corollary, we improve upon the evaluation oracle complexity for decomposable submodular minimization by Axiotis et al. (ICML 2021). Our main technical contribution is an adaptive procedure to select an fif_i term at every iteration via a novel combination of cutting-plane and interior-point methods.

View on arXiv
Comments on this paper