ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2207.14550
11
16

Best-of-Both-Worlds Algorithms for Partial Monitoring

29 July 2022
Taira Tsuchiya
Shinji Ito
Junya Honda
ArXivPDFHTML
Abstract

This study considers the partial monitoring problem with kkk-actions and ddd-outcomes and provides the first best-of-both-worlds algorithms, whose regrets are favorably bounded both in the stochastic and adversarial regimes. In particular, we show that for non-degenerate locally observable games, the regret is O(m2k4log⁡(T)log⁡(kΠT)/Δmin⁡)O(m^2 k^4 \log(T) \log(k_{\Pi} T) / \Delta_{\min})O(m2k4log(T)log(kΠ​T)/Δmin​) in the stochastic regime and O(mk2/3Tlog⁡(T)log⁡kΠ)O(m k^{2/3} \sqrt{T \log(T) \log k_{\Pi}})O(mk2/3Tlog(T)logkΠ​​) in the adversarial regime, where TTT is the number of rounds, mmm is the maximum number of distinct observations per action, Δmin⁡\Delta_{\min}Δmin​ is the minimum suboptimality gap, and kΠk_{\Pi}kΠ​ is the number of Pareto optimal actions. Moreover, we show that for globally observable games, the regret is O(cG2log⁡(T)log⁡(kΠT)/Δmin⁡2)O(c_{\mathcal{G}}^2 \log(T) \log(k_{\Pi} T) / \Delta_{\min}^2)O(cG2​log(T)log(kΠ​T)/Δmin2​) in the stochastic regime and O((cG2log⁡(T)log⁡(kΠT))1/3T2/3)O((c_{\mathcal{G}}^2 \log(T) \log(k_{\Pi} T))^{1/3} T^{2/3})O((cG2​log(T)log(kΠ​T))1/3T2/3) in the adversarial regime, where cGc_{\mathcal{G}}cG​ is a game-dependent constant. We also provide regret bounds for a stochastic regime with adversarial corruptions. Our algorithms are based on the follow-the-regularized-leader framework and are inspired by the approach of exploration by optimization and the adaptive learning rate in the field of online learning with feedback graphs.

View on arXiv
Comments on this paper