ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1610.06603
26
131

Combinatorial Multi-Armed Bandit with General Reward Functions

20 October 2016
Wei Chen
Wei Hu
Fu Li
Jiacheng Li
Yu Liu
P. Lu
ArXivPDFHTML
Abstract

In this paper, we study the stochastic combinatorial multi-armed bandit (CMAB) framework that allows a general nonlinear reward function, whose expected value may not depend only on the means of the input random variables but possibly on the entire distributions of these variables. Our framework enables a much larger class of reward functions such as the max⁡()\max()max() function and nonlinear utility functions. Existing techniques relying on accurate estimations of the means of random variables, such as the upper confidence bound (UCB) technique, do not work directly on these functions. We propose a new algorithm called stochastically dominant confidence bound (SDCB), which estimates the distributions of underlying random variables and their stochastically dominant confidence bounds. We prove that SDCB can achieve O(log⁡T)O(\log{T})O(logT) distribution-dependent regret and O~(T)\tilde{O}(\sqrt{T})O~(T​) distribution-independent regret, where TTT is the time horizon. We apply our results to the KKK-MAX problem and expected utility maximization problems. In particular, for KKK-MAX, we provide the first polynomial-time approximation scheme (PTAS) for its offline problem, and give the first O~(T)\tilde{O}(\sqrt T)O~(T​) bound on the (1−ϵ)(1-\epsilon)(1−ϵ)-approximation regret of its online problem, for any ϵ>0\epsilon>0ϵ>0.

View on arXiv
Comments on this paper