Exploration by Optimisation in Partial Monitoring

Abstract
We provide a simple and efficient algorithm for adversarial -action -outcome non-degenerate locally observable partial monitoring game for which the -round minimax regret is bounded by , matching the best known information-theoretic upper bound. The same algorithm also achieves near-optimal regret for full information, bandit and globally observable games.
View on arXivComments on this paper