v1v2 (latest)

Best of Both Worlds: Regret Minimization versus Minimax Play

17 February 2025

Main:8 Pages

3 Figures

Bibliography:5 Pages

Appendix:14 Pages

Abstract

In this paper, we investigate the existence of online learning algorithms with bandit feedback that simultaneously guarantee $O(1)$ regret compared to a given comparator strategy, and $\tilde{O}(\sqrt{T})$ regret compared to any fixed strategy, where $T$ is the number of rounds. We provide the first affirmative answer to this question whenever the comparator strategy supports every action. In the context of zero-sum games with min-max value zero, both in normal- and extensive form, we show that our results allow us to guarantee to risk at most $O(1)$ loss while being able to gain $\Omega(T)$ from exploitable opponents, thereby combining the benefits of both no-regret algorithms and minimax play.

View on arXiv

@article{müller2025_2502.11673,
  title={ Best of Both Worlds: Regret Minimization versus Minimax Play },
  author={ Adrian Müller and Jon Schneider and Stratis Skoulakis and Luca Viano and Volkan Cevher },
  journal={arXiv preprint arXiv:2502.11673},
  year={ 2025 }
}

Comments on this paper