Policy Gradients Beyond Expectations: Conditional Value-at-Risk

15 April 2014

Abstract

Conditional Value at Risk (CVaR) is a prominent risk measure that is being used extensively in various domains. In this work we present a new formula for the gradient of the CVaR in the form of a conditional expectation. Our result is similar to policy gradients in the reinforcement learning literature. Based on this formula, we propose a novel sampling-based estimator for the CVaR gradient, and a corresponding gradient descent procedure for CVaR optimization. We analyze the bias of the estimator, and prove the convergence of the policy gradient algorithm to a local optimum. In addition, we evaluate our approach in learning a risk-sensitive controller for the game of Tetris, and propose an importance sampling procedure that is suitable for reinforcement learning.

View on arXiv

Comments on this paper