Deep reinforcement learning (DRL) algorithms have proven effective in robot navigation, especially in unknown environments, through directly mapping perception inputs into robot control commands. However, most existing methods ignore the local minimum problem in navigation thereby cannot handle complex unknown environments. In this paper, we propose the first DRL-based navigation method modeled by a SMDP with continuous action space, Adaptive Forward Simulation Time (AFST), to overcome this problem. Specifically, we improve the distributed proximal policy optimization (DPPO) algorithm for the specified SMDP problem by modifying its GAE to better estimate the policy gradient in SMDPs. We evaluate our approach both in the simulator and the real world.

View on arXiv

Comments on this paper