ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.10417
70
10

Variance-Dependent Best Arm Identification

19 June 2021
P. Lu
Chao Tao
Xiaojin Zhang
ArXivPDFHTML
Abstract

We study the problem of identifying the best arm in a stochastic multi-armed bandit game. Given a set of nnn arms indexed from 111 to nnn, each arm iii is associated with an unknown reward distribution supported on [0,1][0,1][0,1] with mean θi\theta_iθi​ and variance σi2\sigma_i^2σi2​. Assume θ1>θ2≥⋯≥θn\theta_1 > \theta_2 \geq \cdots \geq\theta_nθ1​>θ2​≥⋯≥θn​. We propose an adaptive algorithm which explores the gaps and variances of the rewards of the arms and makes future decisions based on the gathered information using a novel approach called \textit{grouped median elimination}. The proposed algorithm guarantees to output the best arm with probability (1−δ)(1-\delta)(1−δ) and uses at most O(∑i=1n(σi2Δi2+1Δi)(ln⁡δ−1+ln⁡ln⁡Δi−1))O \left(\sum_{i = 1}^n \left(\frac{\sigma_i^2}{\Delta_i^2} + \frac{1}{\Delta_i}\right)(\ln \delta^{-1} + \ln \ln \Delta_i^{-1})\right)O(∑i=1n​(Δi2​σi2​​+Δi​1​)(lnδ−1+lnlnΔi−1​)) samples, where Δi\Delta_iΔi​ (i≥2i \geq 2i≥2) denotes the reward gap between arm iii and the best arm and we define Δ1=Δ2\Delta_1 = \Delta_2Δ1​=Δ2​. This achieves a significant advantage over the variance-independent algorithms in some favorable scenarios and is the first result that removes the extra ln⁡n\ln nlnn factor on the best arm compared with the state-of-the-art. We further show that Ω(∑i=1n(σi2Δi2+1Δi)ln⁡δ−1)\Omega \left( \sum_{i = 1}^n \left( \frac{\sigma_i^2}{\Delta_i^2} + \frac{1}{\Delta_i} \right) \ln \delta^{-1} \right)Ω(∑i=1n​(Δi2​σi2​​+Δi​1​)lnδ−1) samples are necessary for an algorithm to achieve the same goal, thereby illustrating that our algorithm is optimal up to doubly logarithmic terms.

View on arXiv
Comments on this paper