15
6

Almost Linear Constant-Factor Sketching for 1\ell_1 and Logistic Regression

Abstract

We improve upon previous oblivious sketching and turnstile streaming results for 1\ell_1 and logistic regression, giving a much smaller sketching dimension achieving O(1)O(1)-approximation and yielding an efficient optimization problem in the sketch space. Namely, we achieve for any constant c>0c>0 a sketching dimension of O~(d1+c)\tilde{O}(d^{1+c}) for 1\ell_1 regression and O~(μd1+c)\tilde{O}(\mu d^{1+c}) for logistic regression, where μ\mu is a standard measure that captures the complexity of compressing the data. For 1\ell_1-regression our sketching dimension is near-linear and improves previous work which either required Ω(logd)\Omega(\log d)-approximation with this sketching dimension, or required a larger poly(d)\operatorname{poly}(d) number of rows. Similarly, for logistic regression previous work had worse poly(μd)\operatorname{poly}(\mu d) factors in its sketching dimension. We also give a tradeoff that yields a 1+ε1+\varepsilon approximation in input sparsity time by increasing the total size to (dlog(n)/ε)O(1/ε)(d\log(n)/\varepsilon)^{O(1/\varepsilon)} for 1\ell_1 and to (μdlog(n)/ε)O(1/ε)(\mu d\log(n)/\varepsilon)^{O(1/\varepsilon)} for logistic regression. Finally, we show that our sketch can be extended to approximate a regularized version of logistic regression where the data-dependent regularizer corresponds to the variance of the individual logistic losses.

View on arXiv
Comments on this paper