172

Optimal Sketching for Kronecker Product Regression and Low Rank Approximation

Neural Information Processing Systems (NeurIPS), 2019
Abstract

We study the Kronecker product regression problem, in which the design matrix is a Kronecker product of two or more matrices. Given AiRni×diA_i \in \mathbb{R}^{n_i \times d_i} for i=1,2,,qi=1,2,\dots,q where nidin_i \gg d_i for each ii, and bRn1n2nqb \in \mathbb{R}^{n_1 n_2 \cdots n_q}, let A=A1A2Aq\mathcal{A} = A_1 \otimes A_2 \otimes \cdots \otimes A_q. Then for p[1,2]p \in [1,2], the goal is to find xRd1dqx \in \mathbb{R}^{d_1 \cdots d_q} that approximately minimizes Axbp\|\mathcal{A}x - b\|_p. Recently, Diao, Song, Sun, and Woodruff (AISTATS, 2018) gave an algorithm which is faster than forming the Kronecker product A\mathcal{A} Specifically, for p=2p=2 their running time is O(i=1qnnz(Ai)+nnz(b))O(\sum_{i=1}^q \text{nnz}(A_i) + \text{nnz}(b)), where nnz(Ai)(A_i) is the number of non-zero entries in AiA_i. Note that nnz(b)(b) can be as large as n1nqn_1 \cdots n_q. For p=1,p=1, q=2q=2 and n1=n2n_1 = n_2, they achieve a worse bound of O(n13/2poly(d1d2)+nnz(b))O(n_1^{3/2} \text{poly}(d_1d_2) + \text{nnz}(b)). In this work, we provide significantly faster algorithms. For p=2p=2, our running time is O(i=1qnnz(Ai))O(\sum_{i=1}^q \text{nnz}(A_i) ), which has no dependence on nnz(b)(b). For p<2p<2, our running time is O(i=1qnnz(Ai)+nnz(b))O(\sum_{i=1}^q \text{nnz}(A_i) + \text{nnz}(b)), which matches the prior best running time for p=2p=2. We also consider the related all-pairs regression problem, where given ARn×d,bRnA \in \mathbb{R}^{n \times d}, b \in \mathbb{R}^n, we want to solve minxAˉxbˉp\min_{x} \|\bar{A}x - \bar{b}\|_p, where AˉRn2×d,bˉRn2\bar{A} \in \mathbb{R}^{n^2 \times d}, \bar{b} \in \mathbb{R}^{n^2} consist of all pairwise differences of the rows of A,bA,b. We give an O(nnz(A))O(\text{nnz}(A)) time algorithm for p[1,2]p \in[1,2], improving the Ω(n2)\Omega(n^2) time needed to form Aˉ\bar{A}. Finally, we initiate the study of Kronecker product low rank and low tt-rank approximation. For input A\mathcal{A} as above, we give O(i=1qnnz(Ai))O(\sum_{i=1}^q \text{nnz}(A_i)) time algorithms, which is much faster than computing A\mathcal{A}.

View on arXiv
Comments on this paper