Improved Analysis of UCRL2 with Empirical Bernstein Inequality
Abstract
We consider the problem of exploration-exploitation in communicating Markov Decision Processes. We provide an analysis of UCRL2 with Empirical Bernstein inequalities (UCRL2B). For any MDP with states, actions, next states and diameter , the regret of UCRL2B is bounded as .
View on arXivComments on this paper
