ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.01376
16
5

Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs

4 October 2022
Haipeng Luo
Hanghang Tong
Mengxiao Zhang
Yuheng Zhang
ArXivPDFHTML
Abstract

We study high-probability regret bounds for adversarial KKK-armed bandits with time-varying feedback graphs over TTT rounds. For general strongly observable graphs, we develop an algorithm that achieves the optimal regret O~((∑t=1Tαt)1/2+max⁡t∈[T]αt)\widetilde{\mathcal{O}}((\sum_{t=1}^T\alpha_t)^{1/2}+\max_{t\in[T]}\alpha_t)O((∑t=1T​αt​)1/2+maxt∈[T]​αt​) with high probability, where αt\alpha_tαt​ is the independence number of the feedback graph at round ttt. Compared to the best existing result [Neu, 2015] which only considers graphs with self-loops for all nodes, our result not only holds more generally, but importantly also removes any poly(K)\text{poly}(K)poly(K) dependence that can be prohibitively large for applications such as contextual bandits. Furthermore, we also develop the first algorithm that achieves the optimal high-probability regret bound for weakly observable graphs, which even improves the best expected regret bound of [Alon et al., 2015] by removing the O(KT)\mathcal{O}(\sqrt{KT})O(KT​) term with a refined analysis. Our algorithms are based on the online mirror descent framework, but importantly with an innovative combination of several techniques. Notably, while earlier works use optimistic biased loss estimators for achieving high-probability bounds, we find it important to use a pessimistic one for nodes without self-loop in a strongly observable graph.

View on arXiv
Comments on this paper