11
43

Optimal Sketching for Kronecker Product Regression and Low Rank Approximation

Abstract

We study the Kronecker product regression problem, in which the design matrix is a Kronecker product of two or more matrices. Given AiRni×diA_i \in \mathbb{R}^{n_i \times d_i} for i=1,2,,qi=1,2,\dots,q where nidin_i \gg d_i for each ii, and bRn1n2nqb \in \mathbb{R}^{n_1 n_2 \cdots n_q}, let A=A1A2Aq\mathcal{A} = A_1 \otimes A_2 \otimes \cdots \otimes A_q. Then for p[1,2]p \in [1,2], the goal is to find xRd1dqx \in \mathbb{R}^{d_1 \cdots d_q} that approximately minimizes Axbp\|\mathcal{A}x - b\|_p. Recently, Diao, Song, Sun, and Woodruff (AISTATS, 2018) gave an algorithm which is faster than forming the Kronecker product A\mathcal{A} Specifically, for p=2p=2 their running time is O(i=1qnnz(Ai)+nnz(b))O(\sum_{i=1}^q \text{nnz}(A_i) + \text{nnz}(b)), where nnz(Ai)(A_i) is the number of non-zero entries in AiA_i. Note that nnz(b)(b) can be as large as n1nqn_1 \cdots n_q. For p=1,p=1, q=2q=2 and n1=n2n_1 = n_2, they achieve a worse bound of O(n13/2poly(d1d2)+nnz(b))O(n_1^{3/2} \text{poly}(d_1d_2) + \text{nnz}(b)). In this work, we provide significantly faster algorithms. For p=2p=2, our running time is O(i=1qnnz(Ai))O(\sum_{i=1}^q \text{nnz}(A_i) ), which has no dependence on nnz(b)(b). For p<2p<2, our running time is O(i=1qnnz(Ai)+nnz(b))O(\sum_{i=1}^q \text{nnz}(A_i) + \text{nnz}(b)), which matches the prior best running time for p=2p=2. We also consider the related all-pairs regression problem, where given ARn×d,bRnA \in \mathbb{R}^{n \times d}, b \in \mathbb{R}^n, we want to solve minxAˉxbˉp\min_{x} \|\bar{A}x - \bar{b}\|_p, where AˉRn2×d,bˉRn2\bar{A} \in \mathbb{R}^{n^2 \times d}, \bar{b} \in \mathbb{R}^{n^2} consist of all pairwise differences of the rows of A,bA,b. We give an O(nnz(A))O(\text{nnz}(A)) time algorithm for p[1,2]p \in[1,2], improving the Ω(n2)\Omega(n^2) time needed to form Aˉ\bar{A}. Finally, we initiate the study of Kronecker product low rank and low tt-rank approximation. For input A\mathcal{A} as above, we give O(i=1qnnz(Ai))O(\sum_{i=1}^q \text{nnz}(A_i)) time algorithms, which is much faster than computing A\mathcal{A}.

View on arXiv
Comments on this paper