ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.03009
14
7

Coresets for Data Discretization and Sine Wave Fitting

6 March 2022
Alaa Maalouf
M. Tukan
Eric Price
Daniel Kane
Dan Feldman
ArXivPDFHTML
Abstract

In the \emph{monitoring} problem, the input is an unbounded stream P=p1,p2⋯P={p_1,p_2\cdots}P=p1​,p2​⋯ of integers in [N]:={1,⋯ ,N}[N]:=\{1,\cdots,N\}[N]:={1,⋯,N}, that are obtained from a sensor (such as GPS or heart beats of a human). The goal (e.g., for anomaly detection) is to approximate the nnn points received so far in PPP by a single frequency sin⁡\sinsin, e.g. min⁡c∈Ccost(P,c)+λ(c)\min_{c\in C}cost(P,c)+\lambda(c)minc∈C​cost(P,c)+λ(c), where cost(P,c)=∑i=1nsin⁡2(2πNpic)cost(P,c)=\sum_{i=1}^n \sin^2(\frac{2\pi}{N} p_ic)cost(P,c)=∑i=1n​sin2(N2π​pi​c), C⊆[N]C\subseteq [N]C⊆[N] is a feasible set of solutions, and λ\lambdaλ is a given regularization function. For any approximation error ε>0\varepsilon>0ε>0, we prove that \emph{every} set PPP of nnn integers has a weighted subset S⊆PS\subseteq PS⊆P (sometimes called core-set) of cardinality ∣S∣∈O(log⁡(N)O(1))|S|\in O(\log(N)^{O(1)})∣S∣∈O(log(N)O(1)) that approximates cost(P,c)cost(P,c)cost(P,c) (for every c∈[N]c\in [N]c∈[N]) up to a multiplicative factor of 1±ε1\pm\varepsilon1±ε. Using known coreset techniques, this implies streaming algorithms using only O((log⁡(N)log⁡(n))O(1))O((\log(N)\log(n))^{O(1)})O((log(N)log(n))O(1)) memory. Our results hold for a large family of functions. Experimental results and open source code are provided.

View on arXiv
Comments on this paper