Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates

14 May 2019

Papers citing "Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates"

4 / 4 papers shown

Title
JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player Zero-Sum Games Yang Li Kun Xiong Yingping Zhang Jiangcheng Zhu Stephen Marcus McAleer Wei Pan Jun Wang Zonghong Dai Yaodong Yang 44 2 0 09 Aug 2023
Monte Carlo Tree Search: A Review of Recent Modifications and Applications M. Świechowski Konrad Godlewski B. Sawicki Jacek Mańdziuk 46 252 0 08 Mar 2021
Foundations of Digital Archæoludology C. Browne Dennis J. N. J. Soemers Éric Piette Matthew Stephenson Michael Conrad ... Abdallah Saffidine Ulrich Schädler Jorge Nuno Silva A. Voogt M. Winands AI4CE 22 8 0 31 May 2019
Off-Policy Actor-Critic T. Degris Martha White R. Sutton OffRL CML 163 220 0 22 May 2012