We provide fast algorithms for overconstrained regression and related problems: for an input matrix and vector , in time we reduce the problem to the same problem with input matrix of dimension and corresponding of dimension . Here, and are a coreset for the problem, consisting of sampled and rescaled rows of and ; and is independent of and polynomial in . Our results improve on the best previous algorithms when , for all except . We also provide a suite of improved results for finding well-conditioned bases via ellipsoidal rounding, illustrating tradeoffs between running time and conditioning quality, including a one-pass conditioning algorithm for general problems. We also provide an empirical evaluation of implementations of our algorithms for , comparing them with related algorithms. Our empirical results show that, in the asymptotic regime, the theory is a very good guide to the practical performance of these algorithms. Our algorithms use our faster constructions of well-conditioned bases for spaces and, for , a fast subspace embedding of independent interest that we call the Fast Cauchy Transform: a distribution over matrices , found obliviously to , that approximately preserves the norms: that is, with large probability, simultaneously for all , , with distortion , for an arbitrarily small constant ; and, moreover, can be computed in time. The techniques underlying our Fast Cauchy Transform include fast Johnson-Lindenstrauss transforms, low-coherence matrices, and rescaling by Cauchy random variables.
View on arXiv