ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.15433
31
5

Best-of-Both-Worlds Algorithms for Linear Contextual Bandits

24 December 2023
Yuko Kuroki
Alberto Rumi
Taira Tsuchiya
Fabio Vitale
Nicolò Cesa-Bianchi
ArXivPDFHTML
Abstract

We study best-of-both-worlds algorithms for KKK-armed linear contextual bandits. Our algorithms deliver near-optimal regret bounds in both the adversarial and stochastic regimes, without prior knowledge about the environment. In the stochastic regime, we achieve the polylogarithmic rate (dK)2polylog⁡(dKT)Δmin⁡\frac{(dK)^2\mathrm{poly}\log(dKT)}{\Delta_{\min}}Δmin​(dK)2polylog(dKT)​, where Δmin⁡\Delta_{\min}Δmin​ is the minimum suboptimality gap over the ddd-dimensional context space. In the adversarial regime, we obtain either the first-order O~(dKL∗)\widetilde{O}(dK\sqrt{L^*})O(dKL∗​) bound, or the second-order O~(dKΛ∗)\widetilde{O}(dK\sqrt{\Lambda^*})O(dKΛ∗​) bound, where L∗L^*L∗ is the cumulative loss of the best action and Λ∗\Lambda^*Λ∗ is a notion of the cumulative second moment for the losses incurred by the algorithm. Moreover, we develop an algorithm based on FTRL with Shannon entropy regularizer that does not require the knowledge of the inverse of the covariance matrix, and achieves a polylogarithmic regret in the stochastic regime while obtaining O~(dKT)\widetilde{O}\big(dK\sqrt{T}\big)O(dKT​) regret bounds in the adversarial regime.

View on arXiv
Comments on this paper