Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1210.4843
Cited By
Deterministic MDPs with Adversarial Rewards and Bandit Feedback
16 October 2012
R. Arora
O. Dekel
Ambuj Tewari
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Deterministic MDPs with Adversarial Rewards and Bandit Feedback"
4 / 4 papers shown
Title
Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret
R. Arora
O. Dekel
Ambuj Tewari
OffRL
78
195
0
27 Jun 2012
On the Possibility of Learning in Reactive Environments with Arbitrary Dependence
D. Ryabko
Marcus Hutter
76
24
0
31 Oct 2008
Universal Reinforcement Learning
Vivek F. Farias
C. Moallemi
Tsachy Weissman
Benjamin Van Roy
157
41
0
20 Jul 2007
The on-line shortest path problem under partial monitoring
Pál Benkö
T. Várady
L. Andor
Ralph Robert Martin
590
355
0
08 Apr 2007
1