ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.12227
75
0

Identifying the Best Transition Law

17 February 2025
Mehrasa Ahmadipour
Elise Crépon
Aurélien Garivier
ArXivPDFHTML
Abstract

Motivated by recursive learning in Markov Decision Processes, this paper studies best-arm identification in bandit problems where each arm's reward is drawn from a multinomial distribution with a known support. We compare the performance { reached by strategies including notably LUCB without and with use of this knowledge. } In the first case, we use classical non-parametric approaches for the confidence intervals. In the second case, where a probability distribution is to be estimated, we first use classical deviation bounds (Hoeffding and Bernstein) on each dimension independently, and then the Empirical Likelihood method (EL-LUCB) on the joint probability vector. The effectiveness of these methods is demonstrated through simulations on scenarios with varying levels of structural complexity.

View on arXiv
@article{ahmadipour2025_2502.12227,
  title={ Identifying the Best Transition Law },
  author={ Mehrasa Ahmadipour and élise Crepon and Aurélien Garivier },
  journal={arXiv preprint arXiv:2502.12227},
  year={ 2025 }
}
Comments on this paper