ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.06283
24
3

Variance-Reduced Conservative Policy Iteration

12 December 2022
Naman Agarwal
Brian Bullins
Karan Singh
ArXivPDFHTML
Abstract

We study the sample complexity of reducing reinforcement learning to a sequence of empirical risk minimization problems over the policy space. Such reductions-based algorithms exhibit local convergence in the function space, as opposed to the parameter space for policy gradient algorithms, and thus are unaffected by the possibly non-linear or discontinuous parameterization of the policy class. We propose a variance-reduced variant of Conservative Policy Iteration that improves the sample complexity of producing a ε\varepsilonε-functional local optimum from O(ε−4)O(\varepsilon^{-4})O(ε−4) to O(ε−3)O(\varepsilon^{-3})O(ε−3). Under state-coverage and policy-completeness assumptions, the algorithm enjoys ε\varepsilonε-global optimality after sampling O(ε−2)O(\varepsilon^{-2})O(ε−2) times, improving upon the previously established O(ε−3)O(\varepsilon^{-3})O(ε−3) sample requirement.

View on arXiv
Comments on this paper