32
51

Extractor-Based Time-Space Lower Bounds for Learning

Abstract

A matrix M:A×X{1,1}M: A \times X \rightarrow \{-1,1\} corresponds to the following learning problem: An unknown element xXx \in X is chosen uniformly at random. A learner tries to learn xx from a stream of samples, (a1,b1),(a2,b2)(a_1, b_1), (a_2, b_2) \ldots, where for every ii, aiAa_i \in A is chosen uniformly at random and bi=M(ai,x)b_i = M(a_i,x). Assume that k,,rk,\ell, r are such that any submatrix of MM of at least 2kA2^{-k} \cdot |A| rows and at least 2X2^{-\ell} \cdot |X| columns, has a bias of at most 2r2^{-r}. We show that any learning algorithm for the learning problem corresponding to MM requires either a memory of size at least Ω(k)\Omega\left(k \cdot \ell \right), or at least 2Ω(r)2^{\Omega(r)} samples. The result holds even if the learner has an exponentially small success probability (of 2Ω(r)2^{-\Omega(r)}). In particular, this shows that for a large class of learning problems, any learning algorithm requires either a memory of size at least Ω((logX)(logA))\Omega\left((\log |X|) \cdot (\log |A|)\right) or an exponential number of samples, achieving a tight Ω((logX)(logA))\Omega\left((\log |X|) \cdot (\log |A|)\right) lower bound on the size of the memory, rather than a bound of Ω(min{(logX)2,(logA)2})\Omega\left(\min\left\{(\log |X|)^2,(\log |A|)^2\right\}\right) obtained in previous works [R17,MM17b]. Moreover, our result implies all previous memory-samples lower bounds, as well as a number of new applications. Our proof builds on [R17] that gave a general technique for proving memory-samples lower bounds.

View on arXiv
Comments on this paper