ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.22872
30
0

Data subsampling for Poisson regression with pth-root-link

30 October 2024
Han Cheng Lie
Alexander Munteanu
ArXivPDFHTML
Abstract

We develop and analyze data subsampling techniques for Poisson regression, the standard model for count data y∈Ny\in\mathbb{N}y∈N. In particular, we consider the Poisson generalized linear model with ID- and square root-link functions. We consider the method of coresets, which are small weighted subsets that approximate the loss function of Poisson regression up to a factor of 1±ε1\pm\varepsilon1±ε. We show Ω(n)\Omega(n)Ω(n) lower bounds against coresets for Poisson regression that continue to hold against arbitrary data reduction techniques up to logarithmic factors. By introducing a novel complexity parameter and a domain shifting approach, we show that sublinear coresets with 1±ε1\pm\varepsilon1±ε approximation guarantee exist when the complexity parameter is small. In particular, the dependence on the number of input points can be reduced to polylogarithmic. We show that the dependence on other input parameters can also be bounded sublinearly, though not always logarithmically. In particular, we show that the square root-link admits an O(log⁡(ymax⁡))O(\log(y_{\max}))O(log(ymax​)) dependence, where ymax⁡y_{\max}ymax​ denotes the largest count presented in the data, while the ID-link requires a Θ(ymax⁡/log⁡(ymax⁡))\Theta(\sqrt{y_{\max}/\log(y_{\max})})Θ(ymax​/log(ymax​)​) dependence. As an auxiliary result for proving the tightness of the bound with respect to ymax⁡y_{\max}ymax​ in the case of the ID-link, we show an improved bound on the principal branch of the Lambert W0W_0W0​ function, which may be of independent interest. We further show the limitations of our analysis when pppth degree root-link functions for p≥3p\geq 3p≥3 are considered, which indicate that other analytical or computational methods would be required if such a generalization is even possible.

View on arXiv
Comments on this paper