We consider the finite horizon continuous reinforcement learning problem. Our contribution is three-fold. First,we give a tractable algorithm based on optimistic value iteration for the problem. Next,we give a lower bound on regret of order for any algorithm discretizes the state space, improving the previous regret bound of of Ortner and Ryabko \cite{contrl} for the same problem. Next,under the assumption that the rewards and transitions are H\"{o}lder Continuous we show that the upper bound on the discretization error is . Finally,we give some simple experiments to validate our propositions.
View on arXiv