ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.18826
108
0
v1v2v3v4 (latest)

Adversarial Combinatorial Semi-bandits with Graph Feedback

26 February 2025
Yuxiao Wen
ArXiv (abs)PDFHTML
Main:13 Pages
Bibliography:3 Pages
1 Tables
Appendix:8 Pages
Abstract

In combinatorial semi-bandits, a learner repeatedly selects from a combinatorial decision set of arms, receives the realized sum of rewards, and observes the rewards of the individual selected arms as feedback. In this paper, we extend this framework to include \emph{graph feedback}, where the learner observes the rewards of all neighboring arms of the selected arms in a feedback graph GGG. We establish that the optimal regret over a time horizon TTT scales as Θ~(ST+αST)\widetilde{\Theta}(S\sqrt{T}+\sqrt{\alpha ST})Θ(ST​+αST​), where SSS is the size of the combinatorial decisions and α\alphaα is the independence number of GGG. This result interpolates between the known regrets Θ~(ST)\widetilde\Theta(S\sqrt{T})Θ(ST​) under full information (i.e., GGG is complete) and Θ~(KST)\widetilde\Theta(\sqrt{KST})Θ(KST​) under the semi-bandit feedback (i.e., GGG has only self-loops), where KKK is the total number of arms. A key technical ingredient is to realize a convexified action using a random decision vector with negative correlations.

View on arXiv
@article{wen2025_2502.18826,
  title={ Adversarial Combinatorial Semi-bandits with Graph Feedback },
  author={ Yuxiao Wen },
  journal={arXiv preprint arXiv:2502.18826},
  year={ 2025 }
}
Comments on this paper