Locally Constrained Policy Optimization for Online Reinforcement Learning in Non-Stationary Input-Driven Environments
International Conference on Learning Representations (ICLR), 2023
- CLLOffRL
Main:10 Pages
10 Figures
Bibliography:8 Pages
12 Tables
Appendix:17 Pages
Abstract
We study online Reinforcement Learning (RL) in non-stationary input-driven environments, where a time-varying exogenous input process affects the environment dynamics. Online RL is challenging in such environments due to catastrophic forgetting (CF). The agent tends to forget prior knowledge as it trains on new experiences. Prior approaches to mitigate this issue assume task labels (which are often not available in practice) or use off-policy methods that can suffer from instability and poor performance.
View on arXivComments on this paper
