ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.08615
11
5

Conservative Contextual Combinatorial Cascading Bandit

17 April 2021
Kun Wang
Canzhe Zhao
Shuai Li
Shuo Shao
ArXivPDFHTML
Abstract

Conservative mechanism is a desirable property in decision-making problems which balance the tradeoff between the exploration and exploitation. We propose the novel \emph{conservative contextual combinatorial cascading bandit (C4C^4C4-bandit)}, a cascading online learning game which incorporates the conservative mechanism. At each time step, the learning agent is given some contexts and has to recommend a list of items but not worse than the base strategy and then observes the reward by some stopping rules. We design the C4C^4C4-UCB algorithm to solve the problem and prove its n-step upper regret bound for two situations: known baseline reward and unknown baseline reward. The regret in both situations can be decomposed into two terms: (a) the upper bound for the general contextual combinatorial cascading bandit; and (b) a constant term for the regret from the conservative mechanism. We also improve the bound of the conservative contextual combinatorial bandit as a by-product. Experiments on synthetic data demonstrate its advantages and validate our theoretical analysis.

View on arXiv
Comments on this paper