ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.12430
35
14

O(T−1)O(T^{-1})O(T−1) Convergence of Optimistic-Follow-the-Regularized-Leader in Two-Player Zero-Sum Markov Games

26 September 2022
Yuepeng Yang
Cong Ma
ArXivPDFHTML
Abstract

We prove that optimistic-follow-the-regularized-leader (OFTRL), together with smooth value updates, finds an O(T−1)O(T^{-1})O(T−1)-approximate Nash equilibrium in TTT iterations for two-player zero-sum Markov games with full information. This improves the O~(T−5/6)\tilde{O}(T^{-5/6})O~(T−5/6) convergence rate recently shown in the paper Zhang et al (2022). The refined analysis hinges on two essential ingredients. First, the sum of the regrets of the two players, though not necessarily non-negative as in normal-form games, is approximately non-negative in Markov games. This property allows us to bound the second-order path lengths of the learning dynamics. Second, we prove a tighter algebraic inequality regarding the weights deployed by OFTRL that shaves an extra log⁡T\log TlogT factor. This crucial improvement enables the inductive analysis that leads to the final O(T−1)O(T^{-1})O(T−1) rate.

View on arXiv
Comments on this paper