12
0

Beyond Scalar Rewards: An Axiomatic Framework for Lexicographic MDPs

Abstract

Recent work has formalized the reward hypothesis through the lens of expected utility theory, by interpreting reward as utility. Hausner's foundational work showed that dropping the continuity axiom leads to a generalization of expected utility theory where utilities are lexicographically ordered vectors of arbitrary dimension. In this paper, we extend this result by identifying a simple and practical condition under which preferences cannot be represented by scalar rewards, necessitating a 2-dimensional reward function. We provide a full characterization of such reward functions, as well as the general d-dimensional case, in Markov Decision Processes (MDPs) under a memorylessness assumption on preferences. Furthermore, we show that optimal policies in this setting retain many desirable properties of their scalar-reward counterparts, while in the Constrained MDP (CMDP) setting -- another common multiobjective setting -- they do not.

View on arXiv
@article{shakerinava2025_2505.12049,
  title={ Beyond Scalar Rewards: An Axiomatic Framework for Lexicographic MDPs },
  author={ Mehran Shakerinava and Siamak Ravanbakhsh and Adam Oberman },
  journal={arXiv preprint arXiv:2505.12049},
  year={ 2025 }
}
Comments on this paper