ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2412.20189
37
0

Accurate Coresets for Latent Variable Models and Regularized Regression

31 December 2024
Sanskar Ranjan
Supratim Shit
ArXivPDFHTML
Abstract

Accurate coresets are a weighted subset of the original dataset, ensuring a model trained on the accurate coreset maintains the same level of accuracy as a model trained on the full dataset. Primarily, these coresets have been studied for a limited range of machine learning models. In this paper, we introduce a unified framework for constructing accurate coresets. Using this framework, we present accurate coreset construction algorithms for general problems, including a wide range of latent variable model problems and ℓp\ell_pℓp​-regularized ℓp\ell_pℓp​-regression. For latent variable models, our coreset size is O(poly(k))O\left(\mathrm{poly}(k)\right)O(poly(k)), where kkk is the number of latent variables. For ℓp\ell_pℓp​-regularized ℓp\ell_pℓp​-regression, our algorithm captures the reduction of model complexity due to regularization, resulting in a coreset whose size is always smaller than dpd^{p}dp for a regularization parameter λ>0\lambda > 0λ>0. Here, ddd is the dimension of the input points. This inherently improves the size of the accurate coreset for ridge regression. We substantiate our theoretical findings with extensive experimental evaluations on real datasets.

View on arXiv
Comments on this paper