Deep Multi-Agent Reinforcement Learning for Decentralized Continuous Cooperative Control

Neural Information Processing Systems (NeurIPS), 2020

14 March 2020

Bei Peng

Tabish Rashid

Christian Schroeder de Witt

Pierre-Alexandre Kamienny

Juil Sock

Wendelin Bohmer

ArXiv (abs)PDF HTML

Abstract

Centralised training with decentralised execution (CTDE) is an important learning paradigm in multi-agent reinforcement learning (MARL). To make progress in CTDE, we introduce Multi-Agent MuJoCo (MAMuJoCo), a novel benchmark suite that, unlike StarCraft Multi-Agent Challenge (SMAC), the predominant benchmark environment, applies to continuous robotic control tasks. To demonstrate the utility of MAMuJoCo, we present a range of benchmark results on this new suite, including comparing the state-of-the-art actor-critic method MADDPG against two novel variants of existing methods. These new methods outperform MADDPG on a number of MAMuJoCo tasks. In addition, we show that, in these continuous cooperative MAMuJoCo tasks, value factorisation plays a greater role in performance than the underlying algorithmic choices. This motivates the necessity of extending the study of value factorisations from $Q$ -learning to actor-critic algorithms.

View on arXiv

Comments on this paper