DRL-ORA: Distributional Reinforcement Learning with Online Risk Adaption

One of the main challenges in reinforcement learning (RL) is that the agent has to make decisions that would influence the future performance without having complete knowledge of the environment. Dynamically adjusting the level of epistemic risk during the learning process can help to achieve reliable policies in safety-critical settings with better efficiency. In this work, we propose a new framework, Distributional RL with Online Risk Adaptation (DRL-ORA). This framework quantifies both epistemic and implicit aleatory uncertainties in a unified manner and dynamically adjusts the epistemic risk levels by solving a total variation minimization problem online. The selection of risk levels is performed efficiently via a grid search using a Follow-The-Leader-type algorithm, where the offline oracle corresponds to a "satisficing measure" under a specially modified loss function. We show that DRL-ORA outperforms existing methods that rely on fixed risk levels or manually designed risk level adaptation in multiple classes of tasks.
View on arXiv@article{wu2025_2310.05179, title={ DRL-ORA: Distributional Reinforcement Learning with Online Risk Adaption }, author={ Yupeng Wu and Wenjie Huang and Chin Pang Ho }, journal={arXiv preprint arXiv:2310.05179}, year={ 2025 } }