ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.10238
16
4

Thompson Sampling for Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit

20 August 2023
Shintaro Nakamura
Masashi Sugiyama
ArXivPDFHTML
Abstract

We study the real-valued combinatorial pure exploration of the multi-armed bandit (R-CPE-MAB) problem. In R-CPE-MAB, a player is given ddd stochastic arms, and the reward of each arm s∈{1,…,d}s\in\{1, \ldots, d\}s∈{1,…,d} follows an unknown distribution with mean μs\mu_sμs​. In each time step, a player pulls a single arm and observes its reward. The player's goal is to identify the optimal \emph{action} π∗=arg max⁡π∈Aμ⊤π\boldsymbol{\pi}^{*} = \argmax_{\boldsymbol{\pi} \in \mathcal{A}} \boldsymbol{\mu}^{\top}\boldsymbol{\pi}π∗=argmaxπ∈A​μ⊤π from a finite-sized real-valued \emph{action set} A⊂Rd\mathcal{A}\subset \mathbb{R}^{d}A⊂Rd with as few arm pulls as possible. Previous methods in the R-CPE-MAB assume that the size of the action set A\mathcal{A}A is polynomial in ddd. We introduce an algorithm named the Generalized Thompson Sampling Explore (GenTS-Explore) algorithm, which is the first algorithm that can work even when the size of the action set is exponentially large in ddd. We also introduce a novel problem-dependent sample complexity lower bound of the R-CPE-MAB problem, and show that the GenTS-Explore algorithm achieves the optimal sample complexity up to a problem-dependent constant factor.

View on arXiv
Comments on this paper