ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.02919
69
1
v1v2v3 (latest)

Multi-step Greedy Reinforcement Learning Algorithms

7 October 2019
Manan Tomar
Yonathan Efroni
Mohammad Ghavamzadeh
ArXiv (abs)PDFHTML
Abstract

Multi-step greedy policies have been extensively used in model-based reinforcement learning (RL), both when a model of the environment is available (e.g.,~in the game of Go) and when it is learned. In this paper, we explore their benefits in model-free RL, when employed using multi-step dynamic programming algorithms: κ\kappaκ-Policy Iteration (κ\kappaκ-PI) and κ\kappaκ-Value Iteration (κ\kappaκ-VI). These methods iteratively compute the next policy (κ\kappaκ-PI) and value function (κ\kappaκ-VI) by solving a surrogate decision problem with a shaped reward and a smaller discount factor. We derive model-free RL algorithms based on κ\kappaκ-PI and κ\kappaκ-VI in which the surrogate problem can be solved by any discrete or continuous action RL method, such as DQN and TRPO. We identify the importance of a hyper-parameter that controls the extent to which the surrogate problem is solved and suggest a way to set this parameter. When evaluated on a range of Atari and MuJoCo benchmark tasks, our results indicate that for the right range of κ\kappaκ, our algorithms outperform DQN and TRPO. This shows that our multi-step greedy algorithms are general enough to be applied over any existing RL algorithm and can significantly improve its performance.

View on arXiv
Comments on this paper