ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.06810
15
16

Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds

14 June 2022
Shinji Ito
Taira Tsuchiya
Junya Honda
    AAML
ArXivPDFHTML
Abstract

This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BOBW) algorithm that works nearly optimally in both stochastic and adversarial settings. In stochastic settings, some existing BOBW algorithms achieve tight gap-dependent regret bounds of O(∑i:Δi>0log⁡TΔi)O(\sum_{i: \Delta_i>0} \frac{\log T}{\Delta_i})O(∑i:Δi​>0​Δi​logT​) for suboptimality gap Δi\Delta_iΔi​ of arm iii and time horizon TTT. As Audibert et al. [2007] have shown, however, that the performance can be improved in stochastic environments with low-variance arms. In fact, they have provided a stochastic MAB algorithm with gap-variance-dependent regret bounds of O(∑i:Δi>0(σi2Δi+1)log⁡T)O(\sum_{i: \Delta_i>0} (\frac{\sigma_i^2}{\Delta_i} + 1) \log T )O(∑i:Δi​>0​(Δi​σi2​​+1)logT) for loss variance σi2\sigma_i^2σi2​ of arm iii. In this paper, we propose the first BOBW algorithm with gap-variance-dependent bounds, showing that the variance information can be used even in the possibly adversarial environment. Further, the leading constant factor in our gap-variance dependent bound is only (almost) twice the value for the lower bound. Additionally, the proposed algorithm enjoys multiple data-dependent regret bounds in adversarial settings and works well in stochastic settings with adversarial corruptions. The proposed algorithm is based on the follow-the-regularized-leader method and employs adaptive learning rates that depend on the empirical prediction error of the loss, which leads to gap-variance-dependent regret bounds reflecting the variance of the arms.

View on arXiv
Comments on this paper