Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games
with Bandit Feedback

Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games with Bandit Feedback

5 March 2023

Papers citing "Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games with Bandit Feedback"

12 / 12 papers shown

Title
Sensor Scheduling in Intrusion Detection Games with Uncertain Payoffs Jayanth Bhargav Shreyas Sundaram Mahsa Ghasemi 20 0 0 20 Apr 2025
Decentralized Online Learning in General-Sum Stackelberg Games Yaolong Yu Haipeng Chen 27 0 0 06 May 2024
A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with Uniform PAC Guarantees Toshinori Kitamura Tadashi Kozuno Masahiro Kato Yuki Ichihara Soichiro Nishimori Akiyoshi Sannai Sho Sonoda Wataru Kumagai Yutaka Matsuo 42 2 0 31 Jan 2024
A Minimaximalist Approach to Reinforcement Learning from Human Feedback Gokul Swamy Christoph Dann Rahul Kidambi Zhiwei Steven Wu Alekh Agarwal OffRL 35 94 0 08 Jan 2024
Multi-Player Zero-Sum Markov Games with Networked Separable Interactions Chanwoo Park Kaipeng Zhang Asuman Ozdaglar 30 8 0 13 Jul 2023
Doubly Optimal No-Regret Learning in Monotone Games Yang Cai Weiqiang Zheng 38 11 0 30 Jan 2023
A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games Wei Xiong Han Zhong Chengshuai Shi Cong Shen Tong Zhang 63 18 0 04 Oct 2022
Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games Shicong Cen Yuejie Chi S. Du Lin Xiao 51 35 0 03 Oct 2022
$$O(T^{-1})$ Convergence of Optimistic-Follow-the-Regularized-Leader in Two-Player Zero-Sum Markov Games$ $O(T^{-1})$ Convergence of Optimistic-Follow-the-Regularized-Leader in Two-Player Zero-Sum Markov Games Yuepeng Yang Cong Ma 37 14 0 26 Sep 2022
Uncoupled Bandit Learning towards Rationalizability: Benchmarks, Barriers, and Algorithms Jibang Wu Haifeng Xu Fan Yao 22 1 0 10 Nov 2021
Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games Yulai Zhao Yuandong Tian Jason D. Lee S. Du OffRL 41 18 0 17 Feb 2021
Independent Policy Gradient Methods for Competitive Reinforcement Learning C. Daskalakis Dylan J. Foster Noah Golowich 62 159 0 11 Jan 2021