6
36

Robust Multi-Agent Multi-Armed Bandits

Abstract

Recent works have shown that agents facing independent instances of a stochastic KK-armed bandit can collaborate to decrease regret. However, these works assume that each agent always recommends their individual best-arm estimates to other agents, which is unrealistic in envisioned applications (machine faults in distributed computing or spam in social recommendation systems). Hence, we generalize the setting to include nn honest and mm malicious agents who recommend best-arm estimates and arbitrary arms, respectively. We first show that even with a single malicious agent, existing collaboration-based algorithms fail to improve regret guarantees over a single-agent baseline. We propose a scheme where honest agents learn who is malicious and dynamically reduce communication with (i.e., "block") them. We show that collaboration indeed decreases regret for this algorithm, assuming mm is small compared to KK but without assumptions on malicious agents' behavior, thus ensuring that our algorithm is robust against any malicious recommendation strategy.

View on arXiv
Comments on this paper