ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.09869
20
53

Near-optimal Regret Bounds for Stochastic Shortest Path

23 February 2020
Alon Cohen
Haim Kaplan
Yishay Mansour
Aviv A. Rosenberg
ArXivPDFHTML
Abstract

Stochastic shortest path (SSP) is a well-known problem in planning and control, in which an agent has to reach a goal state in minimum total expected cost. In the learning formulation of the problem, the agent is unaware of the environment dynamics (i.e., the transition function) and has to repeatedly play for a given number of episodes while reasoning about the problem's optimal solution. Unlike other well-studied models in reinforcement learning (RL), the length of an episode is not predetermined (or bounded) and is influenced by the agent's actions. Recently, Tarbouriech et al. (2019) studied this problem in the context of regret minimization and provided an algorithm whose regret bound is inversely proportional to the square root of the minimum instantaneous cost. In this work we remove this dependence on the minimum cost---we give an algorithm that guarantees a regret bound of O~(B⋆∣S∣∣A∣K)\widetilde{O}(B_\star |S| \sqrt{|A| K})O(B⋆​∣S∣∣A∣K​), where B⋆B_\starB⋆​ is an upper bound on the expected cost of the optimal policy, SSS is the set of states, AAA is the set of actions and KKK is the number of episodes. We additionally show that any learning algorithm must have at least Ω(B⋆∣S∣∣A∣K)\Omega(B_\star \sqrt{|S| |A| K})Ω(B⋆​∣S∣∣A∣K​) regret in the worst case.

View on arXiv
Comments on this paper