ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2207.12463
22
11

Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

25 July 2022
Shuang Qiu
Xiaohan Wei
Jieping Ye
Zhaoran Wang
Zhuoran Yang
    OffRL
ArXivPDFHTML
Abstract

While single-agent policy optimization in a fixed environment has attracted a lot of research attention recently in the reinforcement learning community, much less is known theoretically when there are multiple agents playing in a potentially competitive environment. We take steps forward by proposing and analyzing new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions. We consider two classes of transition structures: factored independent transition and single-controller transition. For both scenarios, we prove tight O~(K)\widetilde{\mathcal{O}}(\sqrt{K})O(K​) regret bounds after KKK episodes in a two-agent competitive game scenario. The regret of each agent is measured against a potentially adversarial opponent who can choose a single best policy in hindsight after observing the full policy sequence. Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization in a non-stationary environment. When both players adopt the proposed algorithms, their overall optimality gap is O~(K)\widetilde{\mathcal{O}}(\sqrt{K})O(K​).

View on arXiv
Comments on this paper