ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.01907
  4. Cited By
A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games

A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games

4 October 2022
Wei Xiong
Han Zhong
Chengshuai Shi
Cong Shen
Tong Zhang
ArXivPDFHTML

Papers citing "A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games"

13 / 13 papers shown
Title
Cat-and-Mouse Satellite Dynamics: Divergent Adversarial Reinforcement
  Learning for Contested Multi-Agent Space Operations
Cat-and-Mouse Satellite Dynamics: Divergent Adversarial Reinforcement Learning for Contested Multi-Agent Space Operations
Cameron Mehlman
Joseph Abramov
Gregory Falco
AAML
28
0
0
26 Sep 2024
Provably Efficient Information-Directed Sampling Algorithms for
  Multi-Agent Reinforcement Learning
Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning
Qiaosheng Zhang
Chenjia Bai
Shuyue Hu
Zhen Wang
Xuelong Li
37
1
0
30 Apr 2024
Feel-Good Thompson Sampling for Contextual Dueling Bandits
Feel-Good Thompson Sampling for Contextual Dueling Bandits
Xuheng Li
Heyang Zhao
Quanquan Gu
40
9
0
09 Apr 2024
Refined Sample Complexity for Markov Games with Independent Linear
  Function Approximation
Refined Sample Complexity for Markov Games with Independent Linear Function Approximation
Yan Dai
Qiwen Cui
S. S. Du
39
1
0
11 Feb 2024
On Sample-Efficient Offline Reinforcement Learning: Data Diversity,
  Posterior Sampling, and Beyond
On Sample-Efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling, and Beyond
Thanh Nguyen-Tang
Raman Arora
OffRL
30
3
0
06 Jan 2024
Iterative Preference Learning from Human Feedback: Bridging Theory and
  Practice for RLHF under KL-Constraint
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
Wei Xiong
Hanze Dong
Chen Ye
Ziqi Wang
Han Zhong
Heng Ji
Nan Jiang
Tong Zhang
OffRL
38
157
0
18 Dec 2023
Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov
  Games
Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games
Songtao Feng
Ming Yin
Yu-Xiang Wang
J. Yang
Yitao Liang
34
0
0
17 Aug 2023
Maximize to Explore: One Objective Function Fusing Estimation, Planning,
  and Exploration
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration
Zhihan Liu
Miao Lu
Wei Xiong
Han Zhong
Haotian Hu
Shenao Zhang
Sirui Zheng
Zhuoran Yang
Zhaoran Wang
OffRL
32
22
0
29 May 2023
On the Statistical Efficiency of Mean Field Reinforcement Learning with
  General Function Approximation
On the Statistical Efficiency of Mean Field Reinforcement Learning with General Function Approximation
Jiawei Huang
Batuhan Yardim
Niao He
34
10
0
18 May 2023
Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games
  with Bandit Feedback
Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games with Bandit Feedback
Yang Cai
Haipeng Luo
Chen-Yu Wei
Weiqiang Zheng
21
17
0
05 Mar 2023
Breaking the Curse of Multiagency: Provably Efficient Decentralized
  Multi-Agent RL with Function Approximation
Breaking the Curse of Multiagency: Provably Efficient Decentralized Multi-Agent RL with Function Approximation
Yuanhao Wang
Qinghua Liu
Yunru Bai
Chi Jin
19
28
0
13 Feb 2023
A Reduction-based Framework for Sequential Decision Making with Delayed
  Feedback
A Reduction-based Framework for Sequential Decision Making with Delayed Feedback
Yunchang Yang
Hangshi Zhong
Tianhao Wu
B. Liu
Liwei Wang
S. Du
OffRL
27
8
0
03 Feb 2023
Improved Worst-Case Regret Bounds for Randomized Least-Squares Value
  Iteration
Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration
Priyank Agrawal
Jinglin Chen
Nan Jiang
27
18
0
23 Oct 2020
1