ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.02108
39
3

Order-Optimal Regret with Novel Policy Gradient Approaches in Infinite-Horizon Average Reward MDPs

2 April 2024
Swetha Ganesh
Washim Uddin Mondal
Vaneet Aggarwal
ArXivPDFHTML
Abstract

We present two Policy Gradient-based algorithms with general parametrization in the context of infinite-horizon average reward Markov Decision Process (MDP). The first one employs Implicit Gradient Transport for variance reduction, ensuring an expected regret of the order O~(T2/3)\tilde{\mathcal{O}}(T^{2/3})O~(T2/3). The second approach, rooted in Hessian-based techniques, ensures an expected regret of the order O~(T)\tilde{\mathcal{O}}(\sqrt{T})O~(T​). These results significantly improve the state-of-the-art O~(T3/4)\tilde{\mathcal{O}}(T^{3/4})O~(T3/4) regret and achieve the theoretical lower bound. We also show that the average-reward function is approximately LLL-smooth, a result that was previously assumed in earlier works.

View on arXiv
@article{ganesh2025_2404.02108,
  title={ Order-Optimal Regret with Novel Policy Gradient Approaches in Infinite-Horizon Average Reward MDPs },
  author={ Swetha Ganesh and Washim Uddin Mondal and Vaneet Aggarwal },
  journal={arXiv preprint arXiv:2404.02108},
  year={ 2025 }
}
Comments on this paper