ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.11033
54
0
v1v2v3 (latest)

Convergence of Policy Mirror Descent Beyond Compatible Function Approximation

16 February 2025
Uri Sherman
Tomer Koren
Yishay Mansour
ArXiv (abs)PDFHTML
Main:12 Pages
2 Figures
Bibliography:6 Pages
1 Tables
Appendix:30 Pages
Abstract

Modern policy optimization methods roughly follow the policy mirror descent (PMD) algorithmic template, for which there are by now numerous theoretical convergence results. However, most of these either target tabular environments, or can be applied effectively only when the class of policies being optimized over satisfies strong closure conditions, which is typically not the case when working with parametric policy classes in large-scale environments. In this work, we develop a theoretical framework for PMD for general policy classes where we replace the closure conditions with a strictly weaker variational gradient dominance assumption, and obtain upper bounds on the rate of convergence to the best-in-class policy. Our main result leverages a novel notion of smoothness with respect to a local norm induced by the occupancy measure of the current policy, and casts PMD as a particular instance of smooth non-convex optimization in non-Euclidean space.

View on arXiv
@article{sherman2025_2502.11033,
  title={ Convergence of Policy Mirror Descent Beyond Compatible Function Approximation },
  author={ Uri Sherman and Tomer Koren and Yishay Mansour },
  journal={arXiv preprint arXiv:2502.11033},
  year={ 2025 }
}
Comments on this paper