ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.10667
14
2

Reinforcement Learning in a Birth and Death Process: Breaking the Dependence on the State Space

21 February 2023
Jonatha Anselmi
B. Gaujal
Louis-Sébastien Rebuffi
ArXivPDFHTML
Abstract

In this paper, we revisit the regret of undiscounted reinforcement learning in MDPs with a birth and death structure. Specifically, we consider a controlled queue with impatient jobs and the main objective is to optimize a trade-off between energy consumption and user-perceived performance. Within this setting, the \emph{diameter} DDD of the MDP is Ω(SS)\Omega(S^S)Ω(SS), where SSS is the number of states. Therefore, the existing lower and upper bounds on the regret at timeTTT, of order O(DSAT)O(\sqrt{DSAT})O(DSAT​) for MDPs with SSS states and AAA actions, may suggest that reinforcement learning is inefficient here. In our main result however, we exploit the structure of our MDPs to show that the regret of a slightly-tweaked version of the classical learning algorithm {\sc Ucrl2} is in fact upper bounded by O~(E2AT)\tilde{\mathcal{O}}(\sqrt{E_2AT})O~(E2​AT​) where E2E_2E2​ is related to the weighted second moment of the stationary measure of a reference policy. Importantly, E2E_2E2​ is bounded independently of SSS. Thus, our bound is asymptotically independent of the number of states and of the diameter. This result is based on a careful study of the number of visits performed by the learning algorithm to the states of the MDP, which is highly non-uniform.

View on arXiv
Comments on this paper