ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2207.08268
26
21

Online Lewis Weight Sampling

17 July 2022
David P. Woodruff
T. Yasuda
ArXivPDFHTML
Abstract

The seminal work of Cohen and Peng introduced Lewis weight sampling to the theoretical computer science community, yielding fast row sampling algorithms for approximating ddd-dimensional subspaces of ℓp\ell_pℓp​ up to (1+ϵ)(1+\epsilon)(1+ϵ) error. Several works have extended this important primitive to other settings, including the online coreset and sliding window models. However, these results are only for p∈{1,2}p\in\{1,2\}p∈{1,2}, and results for p=1p=1p=1 require a suboptimal O~(d2/ϵ2)\tilde O(d^2/\epsilon^2)O~(d2/ϵ2) samples. In this work, we design the first nearly optimal ℓp\ell_pℓp​ subspace embeddings for all p∈(0,∞)p\in(0,\infty)p∈(0,∞) in the online coreset and sliding window models. In both models, our algorithms store O~(d1∨(p/2)/ϵ2)\tilde O(d^{1\lor(p/2)}/\epsilon^2)O~(d1∨(p/2)/ϵ2) rows. This answers a substantial generalization of the main open question of [BDMMUWZ2020], and gives the first results for all p∉{1,2}p\notin\{1,2\}p∈/{1,2}. Towards our result, we give the first analysis of "one-shot'' Lewis weight sampling of sampling rows proportionally to their Lewis weights, with sample complexity O~(dp/2/ϵ2)\tilde O(d^{p/2}/\epsilon^2)O~(dp/2/ϵ2) for p>2p>2p>2. Previously, this scheme was only known to have sample complexity O~(dp/2/ϵ5)\tilde O(d^{p/2}/\epsilon^5)O~(dp/2/ϵ5), whereas O~(dp/2/ϵ2)\tilde O(d^{p/2}/\epsilon^2)O~(dp/2/ϵ2) is known if a more sophisticated recursive sampling is used. The recursive sampling cannot be implemented online, thus necessitating an analysis of one-shot Lewis weight sampling. Our analysis uses a novel connection to online numerical linear algebra. As an application, we obtain the first one-pass streaming coreset algorithms for (1+ϵ)(1+\epsilon)(1+ϵ) approximation of important generalized linear models, such as logistic regression and ppp-probit regression. Our upper bounds are parameterized by a complexity parameter μ\muμ introduced by [MSSW2018], and we show the first lower bounds showing that a linear dependence on μ\muμ is necessary.

View on arXiv
Comments on this paper