Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time

We propose an input sparsity time sampling algorithm that can spectrally approximate the Gram matrix corresponding to the -fold column-wise tensor product of matrices using a nearly optimal number of samples, improving upon all previously known methods by poly factors. Furthermore, for the important special case of the -fold self-tensoring of a dataset, which is the feature matrix of the degree- polynomial kernel, the leading term of our method's runtime is proportional to the size of the input dataset and has no dependence on . Previous techniques either incur poly slowdowns in their runtime or remove the dependence on at the expense of having sub-optimal target dimension, and depend quadratically on the number of data-points in their runtime. Our sampling technique relies on a collection of partially correlated random projections which can be simultaneously applied to a dataset in total time that only depends on the size of , and at the same time their -fold Kronecker product acts as a near-isometry for any fixed vector in the column span of . We also show that our sampling methods generalize to other classes of kernels beyond polynomial, such as Gaussian and Neural Tangent kernels.
View on arXiv