ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.03819
12
2

Simultaneous Double Q-learning with Conservative Advantage Learning for Actor-Critic Methods

8 May 2022
Qing Li
Wen-gang Zhou
Zhenbo Lu
Houqiang Li
    OffRL
ArXivPDFHTML
Abstract

Actor-critic Reinforcement Learning (RL) algorithms have achieved impressive performance in continuous control tasks. However, they still suffer two nontrivial obstacles, i.e., low sample efficiency and overestimation bias. To this end, we propose Simultaneous Double Q-learning with Conservative Advantage Learning (SDQ-CAL). Our SDQ-CAL boosts the Double Q-learning for off-policy actor-critic RL based on a modification of the Bellman optimality operator with Advantage Learning. Specifically, SDQ-CAL improves sample efficiency by modifying the reward to facilitate the distinction from experience between the optimal actions and the others. Besides, it mitigates the overestimation issue by updating a pair of critics simultaneously upon double estimators. Extensive experiments reveal that our algorithm realizes less biased value estimation and achieves state-of-the-art performance in a range of continuous control benchmark tasks. We release the source code of our method at: \url{https://github.com/LQNew/SDQ-CAL}.

View on arXiv
Comments on this paper