ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.06113
22
5

When is Agnostic Reinforcement Learning Statistically Tractable?

9 October 2023
Zeyu Jia
Gene Li
Alexander Rakhlin
Ayush Sekhari
Nathan Srebro
    OffRL
ArXivPDFHTML
Abstract

We study the problem of agnostic PAC reinforcement learning (RL): given a policy class Π\PiΠ, how many rounds of interaction with an unknown MDP (with a potentially large state and action space) are required to learn an ϵ\epsilonϵ-suboptimal policy with respect to Π\PiΠ? Towards that end, we introduce a new complexity measure, called the \emph{spanning capacity}, that depends solely on the set Π\PiΠ and is independent of the MDP dynamics. With a generative model, we show that for any policy class Π\PiΠ, bounded spanning capacity characterizes PAC learnability. However, for online RL, the situation is more subtle. We show there exists a policy class Π\PiΠ with a bounded spanning capacity that requires a superpolynomial number of samples to learn. This reveals a surprising separation for agnostic learnability between generative access and online access models (as well as between deterministic/stochastic MDPs under online access). On the positive side, we identify an additional \emph{sunflower} structure, which in conjunction with bounded spanning capacity enables statistically efficient online RL via a new algorithm called POPLER, which takes inspiration from classical importance sampling methods as well as techniques for reachable-state identification and policy evaluation in reward-free exploration.

View on arXiv
Comments on this paper