ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1306.0940
121
535
v1v2v3v4v5 (latest)

(More) Efficient Reinforcement Learning via Posterior Sampling

4 June 2013
Ian Osband
Daniel Russo
Benjamin Van Roy
ArXiv (abs)PDFHTML
Abstract

Most provably-efficient learning algorithms introduce optimism about poorly-understood states and actions to encourage exploration. We study an alternative approach for efficient exploration, \emph{posterior sampling for reinforcement learning} (PSRL). This algorithm proceeds in repeated episodes of known duration. At the start of each episode, PSRL updates a prior distribution over Markov decision processes and takes one sample from this posterior. PSRL then follows the policy that is optimal for this sample during the episode. The algorithm is conceptually simple, computationally efficient and allows an agent to encode prior knowledge in a natural way. We establish an O~(τSAT)\tilde{O}(\tau S \sqrt{AT})O~(τSAT​) bound on the expected regret, where TTT is time, τ\tauτ is the episode length and SSS and AAA are the cardinalities of the state and action spaces. This bound is one of the first for an algorithm not based on optimism, and close to the state of the art for any reinforcement learning algorithm. We show through simulation that PSRL significantly outperforms existing algorithms with similar regret bounds.

View on arXiv
Comments on this paper