Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form

29 August 2024

Toshinori Kitamura

Yutaka Matsuo

Abstract

Designing a safe policy for uncertain environments is crucial in real-world control applications. However, this challenge remains inadequately addressed within the Markov decision process (MDP) framework. This paper presents the first algorithm capable of identifying a near-optimal policy in a robust constrained MDP (RCMDP), where an optimal policy minimizes cumulative cost while satisfying constraints in the worst-case scenario across a set of environments. We first prove that the conventional Lagrangian max-min formulation with policy gradient methods can become trapped in suboptimal solutions by encountering a sum of conflicting gradients from the objective and constraint functions during its inner minimization problem. To address this, we leverage the epigraph form of the RCMDP problem, which resolves the conflict by selecting a single gradient from either the objective or the constraints. Building on the epigraph form, we propose a binary search algorithm with a policy gradient subroutine and prove that it identifies an $\varepsilon$ -optimal policy in an RCMDP with $\tilde{\mathcal{O}}(\varepsilon^{-4})$ policy evaluations.

View on arXiv

@article{kitamura2025_2408.16286,
  title={ Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form },
  author={ Toshinori Kitamura and Tadashi Kozuno and Wataru Kumagai and Kenta Hoshino and Yohei Hosoe and Kazumi Kasaura and Masashi Hamaya and Paavo Parmas and Yutaka Matsuo },
  journal={arXiv preprint arXiv:2408.16286},
  year={ 2025 }
}

Comments on this paper