ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.06331
11
6

Collaborative Multi-agent Stochastic Linear Bandits

12 May 2022
Ahmadreza Moradipari
Mohammad Ghavamzadeh
M. Alizadeh
ArXivPDFHTML
Abstract

We study a collaborative multi-agent stochastic linear bandit setting, where NNN agents that form a network communicate locally to minimize their overall regret. In this setting, each agent has its own linear bandit problem (its own reward parameter) and the goal is to select the best global action w.r.t. the average of their reward parameters. At each round, each agent proposes an action, and one action is randomly selected and played as the network action. All the agents observe the corresponding rewards of the played actions and use an accelerated consensus procedure to compute an estimate of the average of the rewards obtained by all the agents. We propose a distributed upper confidence bound (UCB) algorithm and prove a high probability bound on its TTT-round regret in which we include a linear growth of regret associated with each communication round. Our regret bound is of order O(TNlog⁡(1/∣λ2∣)⋅(log⁡T)2)\mathcal{O}\Big(\sqrt{\frac{T}{N \log(1/|\lambda_2|)}}\cdot (\log T)^2\Big)O(Nlog(1/∣λ2​∣)T​​⋅(logT)2), where λ2\lambda_2λ2​ is the second largest (in absolute value) eigenvalue of the communication matrix.

View on arXiv
Comments on this paper