ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2003.01803
29
30

MOTS: Minimax Optimal Thompson Sampling

3 March 2020
Tianyuan Jin
Pan Xu
Jieming Shi
Xiaokui Xiao
Quanquan Gu
ArXivPDFHTML
Abstract

Thompson sampling is one of the most widely used algorithms for many online decision problems, due to its simplicity in implementation and superior empirical performance over other state-of-the-art methods. Despite its popularity and empirical success, it has remained an open problem whether Thompson sampling can match the minimax lower bound Ω(KT)\Omega(\sqrt{KT})Ω(KT​) for KKK-armed bandit problems, where TTT is the total time horizon. In this paper, we solve this long open problem by proposing a variant of Thompson sampling called MOTS that adaptively clips the sampling instance of the chosen arm at each time step. We prove that this simple variant of Thompson sampling achieves the minimax optimal regret bound O(KT)O(\sqrt{KT})O(KT​) for finite time horizon TTT, as well as the asymptotic optimal regret bound for Gaussian rewards when TTT approaches infinity. To our knowledge, MOTS is the first Thompson sampling type algorithm that achieves the minimax optimality for multi-armed bandit problems.

View on arXiv
Comments on this paper