21
185

Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning

Abstract

In many real-world tasks, multiple agents must learn to coordinate with each other given their private observations and limited communication ability. Deep multiagent reinforcement learning (Deep-MARL) algorithms have shown superior performance in such challenging settings. One representative class of work is multiagent value decomposition, which decomposes the global shared multiagent Q-value QtotQ_{tot} into individual Q-values QiQ^{i} to guide individuals' behaviors, i.e. VDN imposing an additive formation and QMIX adopting a monotonic assumption using an implicit mixing method. However, most of the previous efforts impose certain assumptions between QtotQ_{tot} and QiQ^{i} and lack theoretical groundings. Besides, they do not explicitly consider the agent-level impact of individuals to the whole system when transforming individual QiQ^{i}s into QtotQ_{tot}. In this paper, we theoretically derive a general formula of QtotQ_{tot} in terms of QiQ^{i}, based on which we can naturally implement a multi-head attention formation to approximate QtotQ_{tot}, resulting in not only a refined representation of QtotQ_{tot} with an agent-level attention mechanism, but also a tractable maximization algorithm of decentralized policies. Extensive experiments demonstrate that our method outperforms state-of-the-art MARL methods on the widely adopted StarCraft benchmark across different scenarios, and attention analysis is further conducted with valuable insights.

View on arXiv
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.