312

Bias-reduced multi-step hindsight experience replay

Abstract

Multi-goal reinforcement learning is widely used in planning and robot manipulation. Two main challenges in multi-goal reinforcement learning are sparse rewards and sample inefficiency. Hindsight Experience Replay (HER) aims to tackle the two challenges with hindsight knowledge. However, HER and its previous variants still need millions of samples and a huge computation. In this paper, we propose \emph{Multi-step Hindsight Experience Replay} (MHER) based on nn-step relabeling, incorporating multi-step relabeled returns to improve sample efficiency. Despite the advantages of nn-step relabeling, we theoretically and experimentally prove the off-policy nn-step bias introduced by nn-step relabeling may lead to poor performance in many environments. To address the above issue, two bias-reduced MHER algorithms, MHER(λ\lambda) and Model-based MHER (MMHER) are presented. MHER(λ\lambda) exploits the λ\lambda return while MMHER benefits from model-based value expansions. Experimental results on numerous multi-goal robotic tasks show that our solutions can successfully alleviate off-policy nn-step bias and achieve significantly higher sample efficiency than HER and Curriculum-guided HER with little additional computation beyond HER.

View on arXiv
Comments on this paper