541

Deep Multi-Agent Reinforcement Learning for Decentralized Continuous Cooperative Control

Neural Information Processing Systems (NeurIPS), 2020
Abstract

Centralised training with decentralised execution (CTDE) is an important learning paradigm in multi-agent reinforcement learning (MARL). To make progress in CTDE, we introduce Multi-Agent Mujoco, a novel benchmark suite that, unlike StarCraft II, the predominant benchmark environment, applies to continuous robotic control tasks. To demonstrate the utility of Multi-Agent Mujoco, we present a range of benchmark results on this new suite, including comparing the state-of-the-art actor-critic method MADDPG against two novel variants of existing methods. These new methods outperform MADDPG on several Multi-Agent Mujoco tasks. In addition, we show that factorisation is key to performance, but other algorithmic choices are not. This motivates the necessity of extending the study of value factorisations from QQ-learning to actor-critic algorithms.

View on arXiv
Comments on this paper