Combining No-regret and Q-learning

7 October 2019

Papers citing "Combining No-regret and Q-learning"

26 / 26 papers shown

Title
Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games: Corrections Dustin Morrill Ryan DÓrazio Marc Lanctot J. R. Wright Michael Bowling Amy Greenwald 58 21 0 24 May 2022
Last-iterate convergence rates for min-max optimization Jacob D. Abernethy Kevin A. Lai Andre Wibisono 54 74 0 05 Jun 2019
Stable-Predictive Optimistic Counterfactual Regret Minimization Gabriele Farina Christian Kroer Noam Brown Tuomas Sandholm 56 34 0 13 Feb 2019
Learning to Collaborate in Markov Decision Processes Goran Radanović R. Devidze David C. Parkes Adish Singla 60 33 0 23 Jan 2019
Double Neural Counterfactual Regret Minimization Hui Li Kailiang Hu Zhibang Ge Tao Jiang Yuan Qi Le Song 33 52 0 27 Dec 2018
Regret Circuits: Composability of Regret Minimizers Krishna Kumar Singh Aron Sarmasi Tuomas Sandholm 21 3 0 06 Nov 2018
Deep Counterfactual Regret Minimization Noam Brown Adam Lerer Sam Gross Tuomas Sandholm 43 213 0 01 Nov 2018
Actor-Critic Policy Optimization in Partially Observable Multiagent Environments S. Srinivasan Marc Lanctot V. Zambaldi Julien Perolat K. Tuyls Rémi Munos Michael Bowling 27 148 0 21 Oct 2018
Solving Imperfect-Information Games via Discounted Regret Minimization Noam Brown Tuomas Sandholm 97 166 0 11 Sep 2018
Last-Iterate Convergence: Zero-Sum Games and Constrained Min-Max Optimization C. Daskalakis Ioannis Panageas 36 178 0 11 Jul 2018
Is Q-learning Provably Efficient? Chi Jin Zeyuan Allen-Zhu Sébastien Bubeck Michael I. Jordan OffRL 44 801 0 10 Jul 2018
Training GANs with Optimism C. Daskalakis Andrew Ilyas Vasilis Syrgkanis Haoyang Zeng 78 514 0 31 Oct 2017
Regret Minimization for Partially Observable Deep Reinforcement Learning Peter H. Jin Kurt Keutzer Sergey Levine 36 51 0 31 Oct 2017
Cycles in adversarial regularized learning P. Mertikopoulos Christos H. Papadimitriou Georgios Piliouras 22 319 0 08 Sep 2017
Monte-Carlo Tree Search by Best Arm Identification E. Kaufmann Wouter M. Koolen 39 37 0 09 Jun 2017
A unified view of entropy-regularized Markov decision processes Gergely Neu Anders Jonsson Vicencc Gómez 82 255 0 22 May 2017
DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker Matej Moravcík Martin Schmid Neil Burch Viliam Lisý Dustin Morrill Nolan Bard Trevor Davis Kevin Waugh Michael Bradley Johanson Michael Bowling BDL 49 904 0 06 Jan 2017
Reset-Free Guided Policy Search: Efficient Deep Reinforcement Learning with Stochastic Initial States William H. Montgomery Anurag Ajay Chelsea Finn Pieter Abbeel Sergey Levine OnRL 51 37 0 04 Oct 2016
On Lower Bounds for Regret in Reinforcement Learning Ian Osband Benjamin Van Roy 52 101 0 09 Aug 2016
Deep Reinforcement Learning from Self-Play in Imperfect-Information Games Johannes Heinrich David Silver SSL 30 397 0 03 Mar 2016
Increasing the Action Gap: New Operators for Reinforcement Learning Marc G. Bellemare Georg Ostrovski A. Guez Philip S. Thomas Rémi Munos 32 156 0 15 Dec 2015
Online Markov decision processes with policy iteration Yao Ma Huatian Zhang Masashi Sugiyama OffRL 24 3 0 15 Oct 2015
Solving Games with Functional Regret Estimation Kevin Waugh Dustin Morrill J. Andrew Bagnell Michael Bowling OffRL 30 58 0 28 Nov 2014
Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions Yasin Abbasi-Yadkori Peter L. Bartlett Csaba Szepesvári 57 86 0 12 Mar 2013
No-Regret Learning in Extensive-Form Games with Imperfect Recall Marc Lanctot Richard G. Gibson Neil Burch Martin A. Zinkevich Michael Bowling OffRL 52 81 0 03 May 2012
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning Stéphane Ross Geoffrey J. Gordon J. Andrew Bagnell OffRL 134 3,196 0 02 Nov 2010