Memory-Sample Tradeoffs for Linear Regression with Small Error

We consider the problem of performing linear regression over a stream of -dimensional examples, and show that any algorithm that uses a subquadratic amount of memory exhibits a slower rate of convergence than can be achieved without memory constraints. Specifically, consider a sequence of labeled examples with drawn independently from a -dimensional isotropic Gaussian, and where for a fixed with and with independent noise drawn uniformly from the interval We show that any algorithm with at most bits of memory requires at least samples to approximate to error with probability of success at least , for sufficiently small as a function of . In contrast, for such , can be recovered to error with probability with memory using examples. This represents the first nontrivial lower bounds for regression with super-linear memory, and may open the door for strong memory/sample tradeoffs for continuous optimization.
View on arXiv