ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1211.6898
50
52

On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes

29 November 2012
B. Scherrer
Boris Lesner
    OffRL
ArXivPDFHTML
Abstract

We consider infinite-horizon stationary γ\gammaγ-discounted Markov Decision Processes, for which it is known that there exists a stationary optimal policy. Using Value and Policy Iteration with some error ϵ\epsilonϵ at each iteration, it is well-known that one can compute stationary policies that are 2γ(1−γ)2ϵ\frac{2\gamma}{(1-\gamma)^2}\epsilon(1−γ)22γ​ϵ-optimal. After arguing that this guarantee is tight, we develop variations of Value and Policy Iteration for computing non-stationary policies that can be up to 2γ1−γϵ\frac{2\gamma}{1-\gamma}\epsilon1−γ2γ​ϵ-optimal, which constitutes a significant improvement in the usual situation when γ\gammaγ is close to 1. Surprisingly, this shows that the problem of "computing near-optimal non-stationary policies" is much simpler than that of "computing near-optimal stationary policies".

View on arXiv
Comments on this paper