ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.12958
31
0

Online Learning of Halfspaces with Massart Noise

21 May 2024
Ilias Diakonikolas
Vasilis Kontonis
Christos Tzamos
Nikos Zarifis
ArXivPDFHTML
Abstract

We study the task of online learning in the presence of Massart noise. Instead of assuming that the online adversary chooses an arbitrary sequence of labels, we assume that the context x\mathbf{x}x is selected adversarially but the label yyy presented to the learner disagrees with the ground-truth label of x\mathbf{x}x with unknown probability at most η\etaη. We study the fundamental class of γ\gammaγ-margin linear classifiers and present a computationally efficient algorithm that achieves mistake bound ηT+o(T)\eta T + o(T)ηT+o(T). Our mistake bound is qualitatively tight for efficient algorithms: it is known that even in the offline setting achieving classification error better than η\etaη requires super-polynomial time in the SQ model. We extend our online learning model to a kkk-arm contextual bandit setting where the rewards -- instead of satisfying commonly used realizability assumptions -- are consistent (in expectation) with some linear ranking function with weight vector w∗\mathbf{w}^\astw∗. Given a list of contexts x1,…xk\mathbf{x}_1,\ldots \mathbf{x}_kx1​,…xk​, if w∗⋅xi>w∗⋅xj\mathbf{w}^*\cdot \mathbf{x}_i > \mathbf{w}^* \cdot \mathbf{x}_jw∗⋅xi​>w∗⋅xj​, the expected reward of action iii must be larger than that of jjj by at least Δ\DeltaΔ. We use our Massart online learner to design an efficient bandit algorithm that obtains expected reward at least (1−1/k) ΔT−o(T)(1-1/k)~ \Delta T - o(T)(1−1/k) ΔT−o(T) bigger than choosing a random action at every round.

View on arXiv
Comments on this paper