ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2105.12540
56
29

Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function Approximation

26 May 2021
Zaiwei Chen
S. Khodadadian
S. T. Maguluri
    OffRL
ArXivPDFHTML
Abstract

In this paper, we develop a novel variant of off-policy natural actor-critic algorithm with linear function approximation and we establish a sample complexity of O(ϵ−3)\mathcal{O}(\epsilon^{-3})O(ϵ−3), outperforming all the previously known convergence bounds of such algorithms. In order to overcome the divergence due to deadly triad in off-policy policy evaluation under function approximation, we develop a critic that employs nnn-step TD-learning algorithm with a properly chosen nnn. We present finite-sample convergence bounds on this critic under both constant and diminishing step sizes, which are of independent interest. Furthermore, we develop a variant of natural policy gradient under function approximation, with an improved convergence rate of O(1/T)\mathcal{O}(1/T)O(1/T) after TTT iterations. Combining the finite sample error bounds of actor and the critic, we obtain the O(ϵ−3)\mathcal{O}(\epsilon^{-3})O(ϵ−3) sample complexity. We derive our sample complexity bounds solely based on the assumption that the behavior policy sufficiently explores all the states and actions, which is a much lighter assumption compared to the related literature.

View on arXiv
Comments on this paper