Deep Multi-Agent Reinforcement Learning for Decentralized Continuous Cooperative Control

Neural Information Processing Systems (NeurIPS), 2020

14 March 2020

Bei Peng

Tabish Rashid

Christian Schroeder de Witt

Pierre-Alexandre Kamienny

Juil Sock

Wendelin Bohmer

ArXiv (abs)PDF HTML

Abstract

Centralised training with decentralised execution (CTDE) is an important learning paradigm in multi-agent reinforcement learning (MARL). To make progress in CTDE, we introduce Multi-Agent Mujoco, a novel benchmark suite that, unlike StarCraft II, the predominant benchmark environment, applies to continuous robotic control tasks. To demonstrate the utility of Multi-Agent Mujoco, we present a range of benchmark results on this new suite, including comparing the state-of-the-art actor-critic method MADDPG against two novel variants of existing methods. These new methods outperform MADDPG on several Multi-Agent Mujoco tasks. In addition, we show that factorisation is key to performance, but other algorithmic choices are not. This motivates the necessity of extending the study of value factorisations from $Q$ -learning to actor-critic algorithms.

View on arXiv

Comments on this paper