ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1210.4843
  4. Cited By
Deterministic MDPs with Adversarial Rewards and Bandit Feedback

Deterministic MDPs with Adversarial Rewards and Bandit Feedback

16 October 2012
R. Arora
O. Dekel
Ambuj Tewari
ArXivPDFHTML

Papers citing "Deterministic MDPs with Adversarial Rewards and Bandit Feedback"

4 / 4 papers shown
Title
Online Bandit Learning against an Adaptive Adversary: from Regret to
  Policy Regret
Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret
R. Arora
O. Dekel
Ambuj Tewari
OffRL
78
195
0
27 Jun 2012
On the Possibility of Learning in Reactive Environments with Arbitrary
  Dependence
On the Possibility of Learning in Reactive Environments with Arbitrary Dependence
D. Ryabko
Marcus Hutter
76
24
0
31 Oct 2008
Universal Reinforcement Learning
Universal Reinforcement Learning
Vivek F. Farias
C. Moallemi
Tsachy Weissman
Benjamin Van Roy
159
41
0
20 Jul 2007
The on-line shortest path problem under partial monitoring
The on-line shortest path problem under partial monitoring
Pál Benkö
T. Várady
L. Andor
Ralph Robert Martin
590
355
0
08 Apr 2007
1