ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.05406
19
102

Non-stationary Reinforcement Learning without Prior Knowledge: An Optimal Black-box Approach

10 February 2021
Chen-Yu Wei
Haipeng Luo
    OffRL
ArXivPDFHTML
Abstract

We propose a black-box reduction that turns a certain reinforcement learning algorithm with optimal regret in a (near-)stationary environment into another algorithm with optimal dynamic regret in a non-stationary environment, importantly without any prior knowledge on the degree of non-stationarity. By plugging different algorithms into our black-box, we provide a list of examples showing that our approach not only recovers recent results for (contextual) multi-armed bandits achieved by very specialized algorithms, but also significantly improves the state of the art for (generalized) linear bandits, episodic MDPs, and infinite-horizon MDPs in various ways. Specifically, in most cases our algorithm achieves the optimal dynamic regret O~(min⁡{LT,Δ1/3T2/3})\widetilde{\mathcal{O}}(\min\{\sqrt{LT}, \Delta^{1/3}T^{2/3}\})O(min{LT​,Δ1/3T2/3}) where TTT is the number of rounds and LLL and Δ\DeltaΔ are the number and amount of changes of the world respectively, while previous works only obtain suboptimal bounds and/or require the knowledge of LLL and Δ\DeltaΔ.

View on arXiv
Comments on this paper