2
0

A Finite-Sample Analysis of Distributionally Robust Average-Reward Reinforcement Learning

Abstract

Robust reinforcement learning (RL) under the average-reward criterion is crucial for long-term decision making under potential environment mismatches, yet its finite-sample complexity study remains largely unexplored. Existing works offer algorithms with asymptotic guarantees, but the absence of finite-sample analysis hinders its principled understanding and practical deployment, especially in data-limited settings. We close this gap by proposing Robust Halpern Iteration (RHI), the first algorithm with provable finite-sample complexity guarantee. Under standard uncertainty sets -- including contamination sets and p\ell_p-norm balls -- RHI attains an ϵ\epsilon-optimal policy with near-optimal sample complexity of O~(SAH2ϵ2)\tilde{\mathcal O}\left(\frac{SA\mathcal H^{2}}{\epsilon^{2}}\right), where SS and AA denote the numbers of states and actions, and H\mathcal H is the robust optimal bias span. This result gives the first polynomial sample complexity guarantee for robust average-reward RL. Moreover, our RHI's independence from prior knowledge distinguishes it from many previous average-reward RL studies. Our work thus constitutes a significant advancement in enhancing the practical applicability of robust average-reward methods to complex, real-world problems.

View on arXiv
@article{roch2025_2505.12462,
  title={ A Finite-Sample Analysis of Distributionally Robust Average-Reward Reinforcement Learning },
  author={ Zachary Roch and Chi Zhang and George Atia and Yue Wang },
  journal={arXiv preprint arXiv:2505.12462},
  year={ 2025 }
}
Comments on this paper