592

Online Reinforcement Learning in Non-Stationary Context-Driven Environments

International Conference on Learning Representations (ICLR), 2023
Main:10 Pages
10 Figures
Bibliography:8 Pages
12 Tables
Appendix:17 Pages
Abstract

We study online reinforcement learning (RL) in non-stationary environments, where a time-varying exogenous context process affects the environment dynamics. Online RL is challenging in such environments due to "catastrophic forgetting" (CF). The agent tends to forget prior knowledge as it trains on new experiences. Prior approaches to mitigate this issue assume task labels (which are often not available in practice), employ brittle regularization heuristics or use off-policy methods that suffer from instability and poor performance.

View on arXiv
Comments on this paper