ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.00633
16
20

Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning

1 November 2021
Yuanzhi Li
Ruosong Wang
Lin F. Yang
ArXivPDFHTML
Abstract

Recently there is a surge of interest in understanding the horizon-dependence of the sample complexity in reinforcement learning (RL). Notably, for an RL environment with horizon length HHH, previous work have shown that there is a probably approximately correct (PAC) algorithm that learns an O(1)O(1)O(1)-optimal policy using polylog(H)\mathrm{polylog}(H)polylog(H) episodes of environment interactions when the number of states and actions is fixed. It is yet unknown whether the polylog(H)\mathrm{polylog}(H)polylog(H) dependence is necessary or not. In this work, we resolve this question by developing an algorithm that achieves the same PAC guarantee while using only O(1)O(1)O(1) episodes of environment interactions, completely settling the horizon-dependence of the sample complexity in RL. We achieve this bound by (i) establishing a connection between value functions in discounted and finite-horizon Markov decision processes (MDPs) and (ii) a novel perturbation analysis in MDPs. We believe our new techniques are of independent interest and could be applied in related questions in RL.

View on arXiv
Comments on this paper