v1v2 (latest)
Coordination without communication: optimal regret in two players
multi-armed bandits
Annual Conference Computational Learning Theory (COLT), 2020
Abstract
We consider two agents playing simultaneously the same stochastic three-armed bandit problem. The two agents are cooperating but they cannot communicate. We propose a strategy with no collisions at all between the players (with very high probability), and with near-optimal regret . We also argue that the extra logarithmic term should be necessary by proving a lower bound for a full information variant of the problem.
View on arXivComments on this paper
