212

Improved Analysis of UCRL2 with Empirical Bernstein Inequality

Abstract

We consider the problem of exploration-exploitation in communicating Markov Decision Processes. We provide an analysis of UCRL2 with Empirical Bernstein inequalities (UCRL2B). For any MDP with SS states, AA actions, ΓS\Gamma \leq S next states and diameter DD, the regret of UCRL2B is bounded as O~(DΓSAT)\widetilde{O}(\sqrt{D\Gamma S A T}).

View on arXiv
Comments on this paper