ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.07606
49
17
v1v2 (latest)

Deep Reinforcement Learning with Relative Entropy Stochastic Search

22 May 2017
Voot Tangkaratt
A. Abdolmaleki
Masashi Sugiyama
ArXiv (abs)PDFHTML
Abstract

Many reinforcement learning methods for continuous control tasks are based on updating a policy function by maximizing an approximated action-value function or Q-function. However, the Q-function also depends on the policy and this dependency often leads to unstable policy learning. To overcome this issue, we propose a method that does not greedily exploit the Q-function. To do so, we upper-bound the Kullback-Leibler divergence of the new policy while maximizing the Q-function. Furthermore, we also lower-bound the entropy of the new policy to maintain its exploratory behavior. We show that by using a Gaussian policy and a Q-function that is quadratic in actions, we can solve the corresponding constrained optimization problem in a closed form. In addition, we show that our method can be regarded as a variant of the well-known deterministic policy gradient method. Through experiments, we evaluate the proposed method using a neural network as a function approximator and show that it gives more stable learning performance than the deep deterministic policy gradient method and the continuous Q-learning method.

View on arXiv
Comments on this paper