Optimal Sketching for Kronecker Product Regression and Low Rank Approximation

29 September 2019

Abstract

We study the Kronecker product regression problem, in which the design matrix is a Kronecker product of two or more matrices. Given $A_i \in \mathbb{R}^{n_i \times d_i}$ for $i=1,2,\dots,q$ where $n_i \gg d_i$ for each $i$ , and $b \in \mathbb{R}^{n_1 n_2 \cdots n_q}$ , let $\mathcal{A} = A_1 \otimes A_2 \otimes \cdots \otimes A_q$ . Then for $p \in [1,2]$ , the goal is to find $x \in \mathbb{R}^{d_1 \cdots d_q}$ that approximately minimizes $\|\mathcal{A}x - b\|_p$ . Recently, Diao, Song, Sun, and Woodruff (AISTATS, 2018) gave an algorithm which is faster than forming the Kronecker product $\mathcal{A}$ Specifically, for $p=2$ their running time is $O(\sum_{i=1}^q \text{nnz}(A_i) + \text{nnz}(b))$ , where nnz $(A_i)$ is the number of non-zero entries in $A_i$ . Note that nnz $(b)$ can be as large as $n_1 \cdots n_q$ . For $p=1,$ $q=2$ and $n_1 = n_2$ , they achieve a worse bound of $O(n_1^{3/2} \text{poly}(d_1d_2) + \text{nnz}(b))$ . In this work, we provide significantly faster algorithms. For $p=2$ , our running time is $O(\sum_{i=1}^q \text{nnz}(A_i) )$ , which has no dependence on nnz $(b)$ . For $p<2$ , our running time is $O(\sum_{i=1}^q \text{nnz}(A_i) + \text{nnz}(b))$ , which matches the prior best running time for $p=2$ . We also consider the related all-pairs regression problem, where given $A \in \mathbb{R}^{n \times d}, b \in \mathbb{R}^n$ , we want to solve $\min_{x} \|\bar{A}x - \bar{b}\|_p$ , where $\bar{A} \in \mathbb{R}^{n^2 \times d}, \bar{b} \in \mathbb{R}^{n^2}$ consist of all pairwise differences of the rows of $A,b$ . We give an $O(\text{nnz}(A))$ time algorithm for $p \in[1,2]$ , improving the $\Omega(n^2)$ time needed to form $\bar{A}$ . Finally, we initiate the study of Kronecker product low rank and low $t$ -rank approximation. For input $\mathcal{A}$ as above, we give $O(\sum_{i=1}^q \text{nnz}(A_i))$ time algorithms, which is much faster than computing $\mathcal{A}$ .

View on arXiv

Comments on this paper