The beyond-demonstrator (BD) inverse reinforcement learning (IRL) is the advanced target of the IRL, which aims to learn from demonstrations and outperform the demonstrator. The BD-IRL provides an entirely new method to build expert systems. It eliminates the dilemma of reward function design and reduces the computation costs. Most of the BD-IRL algorithms are two-stage, which first infers a reward function then learns the policy via reinforcement learning (RL) methods. However, the two separate procedures have high computation complexity and low robustness while introducing more variance. Once the former procedure generates poor reward functions, the latter procedure can hardly learn a considerable policy. This paper proposes a BD-IRL framework entitled hybrid adversarial inverse reinforcement learning (HAIRL) to overcome these flaws. It successfully integrates reward learning and policy optimization into one procedure to reduce the computation complexity. Moreover, the HAIRL can dynamically update the reward function following the learning process, which is more adaptive and robust. The simulation results show that the HAIRL achieves higher performance when compared with other similar state-of-the-art (SOTA) algorithms.
View on arXiv