A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares

23 June 2014

Abstract

We consider statistical aspects of solving large-scale least-squares (LS) problems using randomized sketching algorithms. Prior work has typically adopted an \emph{algorithmic perspective}, in that it has made no statistical assumptions on the input $X$ and $Y$ , and instead it has assumed that the data $(X,Y)$ are fixed and worst-case. In this paper, we adopt a \emph{statistical perspective}, and we consider the mean-squared error performance of randomized sketching algorithms, when data $(X, Y)$ are generated according to a statistical linear model $Y = X \beta + \epsilon$ , where $\epsilon$ is a noise process. To do this, we first develop a framework for assessing, in a unified manner, algorithmic and statistical aspects of randomized sketching methods. We then consider the statistical predicition efficiency (SPE) and the statistical residual efficiency (SRE) of the sketched LS estimator; and we use our framework to provide results for several types of random projection and random sampling sketching algorithms. Among other results, we show that the SRE can be bounded when $p \lesssim r \ll n$ but that the SPE typically requires the sample size $r$ to be substantially larger. Our theoretical results reveal that, depending on the specifics of the situation, leverage-based sampling methods can perform as well as or better than projection methods. Our empirical results reveal that when $r$ is only slightly greater than $p$ and much less than $n$ , projection-based methods out-perform sampling-based methods, but as $r$ grows, sampling methods start to out-perform projection methods.

View on arXiv

Comments on this paper