Zero-Shot Reinforcement Learning Under Partial Observability

18 June 2025

Main:9 Pages

8 Figures

Bibliography:6 Pages

5 Tables

Appendix:8 Pages

Abstract

Recent work has shown that, under certain assumptions, zero-shot reinforcement learning (RL) methods can generalise to any unseen task in an environment after reward-free pre-training. Access to Markov states is one such assumption, yet, in many real-world applications, the Markov state is only partially observable. Here, we explore how the performance of standard zero-shot RL methods degrades when subjected to partially observability, and show that, as in single-task RL, memory-based architectures are an effective remedy. We evaluate our memory-based zero-shot RL methods in domains where the states, rewards and a change in dynamics are partially observed, and show improved performance over memory-free baselines. Our code is open-sourced via:this https URL.

View on arXiv

@article{jeen2025_2506.15446,
  title={ Zero-Shot Reinforcement Learning Under Partial Observability },
  author={ Scott Jeen and Tom Bewley and Jonathan M. Cullen },
  journal={arXiv preprint arXiv:2506.15446},
  year={ 2025 }
}

Comments on this paper