ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.04117
19
40

The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition

8 June 2021
Tiancheng Jin
Longbo Huang
Haipeng Luo
ArXivPDFHTML
Abstract

We consider the best-of-both-worlds problem for learning an episodic Markov Decision Process through TTT episodes, with the goal of achieving O~(T)\widetilde{\mathcal{O}}(\sqrt{T})O(T​) regret when the losses are adversarial and simultaneously O(polylog(T))\mathcal{O}(\text{polylog}(T))O(polylog(T)) regret when the losses are (almost) stochastic. Recent work by [Jin and Luo, 2020] achieves this goal when the fixed transition is known, and leaves the case of unknown transition as a major open question. In this work, we resolve this open problem by using the same Follow-the-Regularized-Leader (FTRL\text{FTRL}FTRL) framework together with a set of new techniques. Specifically, we first propose a loss-shifting trick in the FTRL\text{FTRL}FTRL analysis, which greatly simplifies the approach of [Jin and Luo, 2020] and already improves their results for the known transition case. Then, we extend this idea to the unknown transition case and develop a novel analysis which upper bounds the transition estimation error by (a fraction of) the regret itself in the stochastic setting, a key property to ensure O(polylog(T))\mathcal{O}(\text{polylog}(T))O(polylog(T)) regret.

View on arXiv
Comments on this paper