ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.00993
11
5

On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio Correction

2 June 2021
Jiawei Huang
Nan Jiang
ArXivPDFHTML
Abstract

In this paper, we study the convergence properties of off-policy policy improvement algorithms with state-action density ratio correction under function approximation setting, where the objective function is formulated as a max-max-min optimization problem. We characterize the bias of the learning objective and present two strategies with finite-time convergence guarantees. In our first strategy, we present algorithm P-SREDA with convergence rate O(ϵ−3)O(\epsilon^{-3})O(ϵ−3), whose dependency on ϵ\epsilonϵ is optimal. In our second strategy, we propose a new off-policy actor-critic style algorithm named O-SPIM. We prove that O-SPIM converges to a stationary point with total complexity O(ϵ−4)O(\epsilon^{-4})O(ϵ−4), which matches the convergence rate of some recent actor-critic algorithms in the on-policy setting.

View on arXiv
Comments on this paper