ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2008.10806
  4. Cited By
Ensuring Monotonic Policy Improvement in Entropy-regularized Value-based
  Reinforcement Learning

Ensuring Monotonic Policy Improvement in Entropy-regularized Value-based Reinforcement Learning

25 August 2020
Lingwei Zhu
Takamitsu Matsubara
ArXivPDFHTML

Papers citing "Ensuring Monotonic Policy Improvement in Entropy-regularized Value-based Reinforcement Learning"

8 / 8 papers shown
Title
Solving Rubik's Cube with a Robot Hand
Solving Rubik's Cube with a Robot Hand
OpenAI
Ilge Akkaya
Marcin Andrychowicz
Maciek Chociej
Ma-teusz Litwin
...
Peter Welinder
Lilian Weng
Qiming Yuan
Wojciech Zaremba
Lei Zhang
ODL
113
1,226
0
16 Oct 2019
Deep Conservative Policy Iteration
Deep Conservative Policy Iteration
Nino Vieillard
Olivier Pietquin
Matthieu Geist
40
26
0
24 Jun 2019
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement
  Learning with a Stochastic Actor
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Tuomas Haarnoja
Aurick Zhou
Pieter Abbeel
Sergey Levine
292
8,329
0
04 Jan 2018
Reinforcement Learning with Deep Energy-Based Policies
Reinforcement Learning with Deep Energy-Based Policies
Tuomas Haarnoja
Haoran Tang
Pieter Abbeel
Sergey Levine
98
1,340
0
27 Feb 2017
Taming the Noise in Reinforcement Learning via Soft Updates
Taming the Noise in Reinforcement Learning via Soft Updates
Roy Fox
Ari Pakman
Naftali Tishby
70
338
0
28 Dec 2015
Increasing the Action Gap: New Operators for Reinforcement Learning
Increasing the Action Gap: New Operators for Reinforcement Learning
Marc G. Bellemare
Georg Ostrovski
A. Guez
Philip S. Thomas
Rémi Munos
68
157
0
15 Dec 2015
Trust Region Policy Optimization
Trust Region Policy Optimization
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
277
6,764
0
19 Feb 2015
Dynamic Policy Programming
Dynamic Policy Programming
M. G. Azar
Vicencc Gómez
H. Kappen
103
123
0
12 Apr 2010
1