A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Low-Rank MDPs

7 February 2024

Kihyuk Hong

Ambuj Tewari

OffRL

ArXiv (abs)PDF HTML Github

Main:11 Pages

Bibliography:3 Pages

1 Tables

Appendix:15 Pages

Abstract

Offline reinforcement learning (RL) aims to learn a policy that maximizes the expected cumulative reward using a pre-collected dataset. Offline RL with low-rank MDPs or general function approximation has been widely studied recently, but existing algorithms with sample complexity $O(\epsilon^{-2})$ for finding an $\epsilon$ -optimal policy either require a uniform data coverage assumptions or are computationally inefficient. In this paper, we propose a primal dual algorithm for offline RL with low-rank MDPs in the discounted infinite-horizon setting. Our algorithm is the first computationally efficient algorithm in this setting that achieves sample complexity of $O(\epsilon^{-2})$ with partial data coverage assumption. This improves upon a recent work that requires $O(\epsilon^{-4})$ samples. Moreover, our algorithm extends the previous work to the offline constrained RL setting by supporting constraints on additional reward signals.

View on arXiv

Comments on this paper