Deterministic MDPs with Adversarial Rewards and Bandit Feedback

16 October 2012

Papers citing "Deterministic MDPs with Adversarial Rewards and Bandit Feedback"

4 / 4 papers shown

Title
Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret R. Arora O. Dekel Ambuj Tewari OffRL 78 195 0 27 Jun 2012
On the Possibility of Learning in Reactive Environments with Arbitrary Dependence D. Ryabko Marcus Hutter 76 24 0 31 Oct 2008
Universal Reinforcement Learning Vivek F. Farias C. Moallemi Tsachy Weissman Benjamin Van Roy 159 41 0 20 Jul 2007
The on-line shortest path problem under partial monitoring Pál Benkö T. Várady L. Andor Ralph Robert Martin 590 355 0 08 Apr 2007