ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.09859
33
14

Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP

18 December 2021
Liyu Chen
Rahul Jain
Haipeng Luo
ArXivPDFHTML
Abstract

We introduce two new no-regret algorithms for the stochastic shortest path (SSP) problem with a linear MDP that significantly improve over the only existing results of (Vial et al., 2021). Our first algorithm is computationally efficient and achieves a regret bound O~(d3B⋆2T⋆K)\widetilde{O}\left(\sqrt{d^3B_{\star}^2T_{\star} K}\right)O(d3B⋆2​T⋆​K​), where ddd is the dimension of the feature space, B⋆B_{\star}B⋆​ and T⋆T_{\star}T⋆​ are upper bounds of the expected costs and hitting time of the optimal policy respectively, and KKK is the number of episodes. The same algorithm with a slight modification also achieves logarithmic regret of order O(d3B⋆4cmin⁡2gapmin⁡ln⁡5dB⋆Kcmin⁡)O\left(\frac{d^3B_{\star}^4}{c_{\min}^2\text{gap}_{\min}}\ln^5\frac{dB_{\star} K}{c_{\min}} \right)O(cmin2​gapmin​d3B⋆4​​ln5cmin​dB⋆​K​), where gapmin⁡\text{gap}_{\min}gapmin​ is the minimum sub-optimality gap and cmin⁡c_{\min}cmin​ is the minimum cost over all state-action pairs. Our result is obtained by developing a simpler and improved analysis for the finite-horizon approximation of (Cohen et al., 2021) with a smaller approximation error, which might be of independent interest. On the other hand, using variance-aware confidence sets in a global optimization problem, our second algorithm is computationally inefficient but achieves the first "horizon-free" regret bound O~(d3.5B⋆K)\widetilde{O}(d^{3.5}B_{\star}\sqrt{K})O(d3.5B⋆​K​) with no polynomial dependency on T⋆T_{\star}T⋆​ or 1/cmin⁡1/c_{\min}1/cmin​, almost matching the Ω(dB⋆K)\Omega(dB_{\star}\sqrt{K})Ω(dB⋆​K​) lower bound from (Min et al., 2021).

View on arXiv
Comments on this paper