11
1

Online Learning with Bounded Recall

Jon Schneider
Kiran Vodrahalli
Abstract

We study the problem of full-information online learning in the "bounded recall" setting popular in the study of repeated games. An online learning algorithm A\mathcal{A} is MM-bounded-recall\textit{bounded-recall} if its output at time tt can be written as a function of the MM previous rewards (and not e.g. any other internal state of A\mathcal{A}). We first demonstrate that a natural approach to constructing bounded-recall algorithms from mean-based no-regret learning algorithms (e.g., running Hedge over the last MM rounds) fails, and that any such algorithm incurs constant regret per round. We then construct a stationary bounded-recall algorithm that achieves a per-round regret of Θ(1/M)\Theta(1/\sqrt{M}), which we complement with a tight lower bound. Finally, we show that unlike the perfect recall setting, any low regret bound bounded-recall algorithm must be aware of the ordering of the past MM losses -- any bounded-recall algorithm which plays a symmetric function of the past MM losses must incur constant regret per round.

View on arXiv
Comments on this paper