15
35

Memory-Sample Tradeoffs for Linear Regression with Small Error

Abstract

We consider the problem of performing linear regression over a stream of dd-dimensional examples, and show that any algorithm that uses a subquadratic amount of memory exhibits a slower rate of convergence than can be achieved without memory constraints. Specifically, consider a sequence of labeled examples (a1,b1),(a2,b2),(a_1,b_1), (a_2,b_2)\ldots, with aia_i drawn independently from a dd-dimensional isotropic Gaussian, and where bi=ai,x+ηi,b_i = \langle a_i, x\rangle + \eta_i, for a fixed xRdx \in \mathbb{R}^d with x2=1\|x\|_2 = 1 and with independent noise ηi\eta_i drawn uniformly from the interval [2d/5,2d/5].[-2^{-d/5},2^{-d/5}]. We show that any algorithm with at most d2/4d^2/4 bits of memory requires at least Ω(dloglog1ϵ)\Omega(d \log \log \frac{1}{\epsilon}) samples to approximate xx to 2\ell_2 error ϵ\epsilon with probability of success at least 2/32/3, for ϵ\epsilon sufficiently small as a function of dd. In contrast, for such ϵ\epsilon, xx can be recovered to error ϵ\epsilon with probability 1o(1)1-o(1) with memory O(d2log(1/ϵ))O\left(d^2 \log(1/\epsilon)\right) using dd examples. This represents the first nontrivial lower bounds for regression with super-linear memory, and may open the door for strong memory/sample tradeoffs for continuous optimization.

View on arXiv
Comments on this paper