ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1709.04004
46
29
v1v2 (latest)

Adaptive Exploration-Exploitation Tradeoff for Opportunistic Bandits

12 September 2017
Huasen Wu
Xueying Guo
Xin Liu
ArXiv (abs)PDFHTML
Abstract

In this paper, we propose and study opportunistic bandits - a new variant of bandits where the regret of pulling a suboptimal arm varies under different environmental conditions, such as network load or produce price. When the load/price is low, so is the cost/regret of pulling a suboptimal arm (e.g., trying a suboptimal network configuration). Therefore, intuitively, we could explore more when the load is low and exploit more when the load is high. Inspired by this intuition, we propose an Adaptive Upper-Confidence-Bound (AdaUCB) algorithm to adaptively balance the exploration-exploitation tradeoff for opportunistic bandits. We prove that AdaUCB achieves O(log⁡T)O(\log T)O(logT) regret with a smaller coefficient than the traditional UCB algorithm. Furthermore, AdaUCB achieves O(1)O(1)O(1) regret when the exploration cost is zero if the load level is below a certain threshold. Last, based on both synthetic data and real-world traces, experimental results show that AdaUCB significantly outperforms other bandit algorithms, such as UCB and TS (Thompson Sampling), under large load fluctuations.

View on arXiv
Comments on this paper