90
12

Exploiting Numerical Sparsity for Efficient Learning : Faster Eigenvector Computation and Regression

Abstract

In this paper, we obtain improved running times for regression and top eigenvector computation for numerically sparse matrices. Given a data matrix ARn×dA \in \mathbb{R}^{n \times d} where every row aRda \in \mathbb{R}^d has a22L\|a\|_2^2 \leq L and numerical sparsity at most ss, i.e. a12/a22s\|a\|_1^2 / \|a\|_2^2 \leq s, we provide faster algorithms for these problems in many parameter settings. For top eigenvector computation, we obtain a running time of O~(nd+r(s+rs)/gap2)\tilde{O}(nd + r(s + \sqrt{r s}) / \mathrm{gap}^2) where gap>0\mathrm{gap} > 0 is the relative gap between the top two eigenvectors of AAA^\top A and rr is the stable rank of AA. This running time improves upon the previous best unaccelerated running time of O(nd+rd/gap2)O(nd + r d / \mathrm{gap}^2) as it is always the case that rdr \leq d and sds \leq d. For regression, we obtain a running time of O~(nd+(nL/μ)snL/μ)\tilde{O}(nd + (nL / \mu) \sqrt{s nL / \mu}) where μ>0\mu > 0 is the smallest eigenvalue of AAA^\top A. This running time improves upon the previous best unaccelerated running time of O~(nd+nLd/μ)\tilde{O}(nd + n L d / \mu). This result expands the regimes where regression can be solved in nearly linear time from when L/μ=O~(1)L/\mu = \tilde{O}(1) to when L/μ=O~(d2/3/(sn)1/3)L / \mu = \tilde{O}(d^{2/3} / (sn)^{1/3}). Furthermore, we obtain similar improvements even when row norms and numerical sparsities are non-uniform and we show how to achieve even faster running times by accelerating using approximate proximal point [Frostig et. al. 2015] / catalyst [Lin et. al. 2015]. Our running times depend only on the size of the input and natural numerical measures of the matrix, i.e. eigenvalues and p\ell_p norms, making progress on a key open problem regarding optimal running times for efficient large-scale learning.

View on arXiv
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.