18
33

Towards Theoretical Understandings of Robust Markov Decision Processes: Sample Complexity and Asymptotics

Abstract

In this paper, we study the non-asymptotic and asymptotic performances of the optimal robust policy and value function of robust Markov Decision Processes(MDPs), where the optimal robust policy and value function are solved only from a generative model. While prior work focusing on non-asymptotic performances of robust MDPs is restricted in the setting of the KL uncertainty set and (s,a)(s,a)-rectangular assumption, we improve their results and also consider other uncertainty sets, including L1L_1 and χ2\chi^2 balls. Our results show that when we assume (s,a)(s,a)-rectangular on uncertainty sets, the sample complexity is about O~(S2Aε2ρ2(1γ)4)\widetilde{O}\left(\frac{|\mathcal{S}|^2|\mathcal{A}|}{\varepsilon^2\rho^2(1-\gamma)^4}\right). In addition, we extend our results from (s,a)(s,a)-rectangular assumption to ss-rectangular assumption. In this scenario, the sample complexity varies with the choice of uncertainty sets and is generally larger than the case under (s,a)(s,a)-rectangular assumption. Moreover, we also show that the optimal robust value function is asymptotic normal with a typical rate n\sqrt{n} under (s,a)(s,a) and ss-rectangular assumptions from both theoretical and empirical perspectives.

View on arXiv
Comments on this paper