ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1510.08906
39
249

Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning

29 October 2015
Christoph Dann
Emma Brunskill
ArXivPDFHTML
Abstract

Recently, there has been significant progress in understanding reinforcement learning in discounted infinite-horizon Markov decision processes (MDPs) by deriving tight sample complexity bounds. However, in many real-world applications, an interactive learning agent operates for a fixed or bounded period of time, for example tutoring students for exams or handling customer service requests. Such scenarios can often be better treated as episodic fixed-horizon MDPs, for which only looser bounds on the sample complexity exist. A natural notion of sample complexity in this setting is the number of episodes required to guarantee a certain performance with high probability (PAC guarantee). In this paper, we derive an upper PAC bound O~(∣S∣2∣A∣H2ϵ2ln⁡1δ)\tilde O(\frac{|\mathcal S|^2 |\mathcal A| H^2}{\epsilon^2} \ln\frac 1 \delta)O~(ϵ2∣S∣2∣A∣H2​lnδ1​) and a lower PAC bound Ω~(∣S∣∣A∣H2ϵ2ln⁡1δ+c)\tilde \Omega(\frac{|\mathcal S| |\mathcal A| H^2}{\epsilon^2} \ln \frac 1 {\delta + c})Ω~(ϵ2∣S∣∣A∣H2​lnδ+c1​) that match up to log-terms and an additional linear dependency on the number of states ∣S∣|\mathcal S|∣S∣. The lower bound is the first of its kind for this setting. Our upper bound leverages Bernstein's inequality to improve on previous bounds for episodic finite-horizon MDPs which have a time-horizon dependency of at least H3H^3H3.

View on arXiv
Comments on this paper