Batched Stochastic Bandit for Nondegenerate Functions

9 May 2024

Abstract

This paper studies batched bandit learning problems for nondegenerate functions. We introduce an algorithm that solves the batched bandit problem for nondegenerate functions near-optimally. More specifically, we introduce an algorithm, called Geometric Narrowing (GN), whose regret bound is of order $\widetilde{\mathcal{O}} ( A_{+}^d \sqrt{T} )$ . In addition, GN only needs $\mathcal{O} (\log \log T)$ batches to achieve this regret. We also provide lower bound analysis for this problem. More specifically, we prove that over some (compact) doubling metric space of doubling dimension $d$ : 1. For any policy $\pi$ , there exists a problem instance on which $\pi$ admits a regret of order ${\Omega} ( A_-^d \sqrt{T})$ ; 2. No policy can achieve a regret of order $A_-^d \sqrt{T}$ over all problem instances, using less than $\Omega ( \log \log T )$ rounds of communications. Our lower bound analysis shows that the GN algorithm achieves near optimal regret with minimal number of batches.

View on arXiv

@article{liu2025_2405.05733,
  title={ Batched Stochastic Bandit for Nondegenerate Functions },
  author={ Yu Liu and Yunlu Shu and Tianyu Wang },
  journal={arXiv preprint arXiv:2405.05733},
  year={ 2025 }
}

Comments on this paper