On the Impossibility of Convergence of Mixed Strategies with No Regret Learning

Mathematics of Operations Research (MOR), 2020

3 December 2020

Abstract

We study the limiting behavior of the mixed strategies that result from a general class of optimal no-regret learning strategies in a repeated game setting where the stage game is any 2 by 2 competitive game (for which all the Nash equilibria of the game are completely mixed) that may be zero-sum or non-zero-sum. We consider optimal no-regret strategies that are mean-based (i.e. information set at each step is the empirical average of the opponent's realized play) and monotonic (either non-decreasing or non-increasing) in their argument. We show that for any such choice of strategies, the limiting mixed strategies of the players cannot converge almost surely to any Nash equilibrium. This negative result is also shown to hold under a broad class of relaxations of these assumptions, which includes popular variants of Online-Mirror-Descent with optimism and/or adaptive step-sizes. Finally, we conjecture that the monotonicity assumption can be removed, and provide partial evidence for this conjecture. Our results identify the inherent stochasticity in players' realizations as a critical factor underlying this divergence, and demonstrate a crucial difference in outcomes between using the opponent's mixtures and realizations to make strategy updates.

View on arXiv

Comments on this paper