This paper examines the impact of agents' myopic optimization on the efficiency of systems comprised by many selfish agents. In contrast to standard congestion games where agents interact in a one-shot fashion, in our model each agent chooses an infinite sequence of actions and maximizes the total reward stream discounted over time under different ways of computing present values. Our model assumes that actions consume common resources that get congested, and the action choice by an agent affects the completion times of actions chosen by other agents, which in turn affects the time rewards are accrued and their discounted value. This is a mean-field game, where an agent's reward depends on the decisions of the other agents through the resulting action completion times. For this type of game we define stationary equilibria, and analyze their existence and price of anarchy (PoA). Overall, we find that the PoA depends entirely on the type of discounting rather than its specific parameters. For exponential discounting, myopic behaviour leads to extreme inefficiency: the PoA is infinity for any value of the discount parameter. For power law discounting, such inefficiency is greatly reduced and the PoA is 2 whenever stationary equilibria exist. This matches the PoA when there is no discounting and players maximize long-run average rewards. Additionally, we observe that exponential discounting may introduce unstable equilibria in learning algorithms, if action completion times are interdependent. In contrast, under no discounting all equilibria are stable.
View on arXiv@article{li2025_2504.20774, title={ On the Effect of Time Preferences on the Price of Anarchy }, author={ Yunpeng Li and Antonis Dimakis and Costas A. Courcoubetis }, journal={arXiv preprint arXiv:2504.20774}, year={ 2025 } }