RIIT: Rethinking the Importance of Implementation Tricks in Multi-Agent
Reinforcement Learning
Recent years have seen revolutionary breakthroughs in the field of Multi-Agent Deep Reinforcement Learning (MADRL), with its successful applications to various complex scenarios such as computer games and robot swarms. We investigate the impact of "implementation tricks" of state-of-the-art (SOTA) QMIX-based algorithms. First, we find that after minimal tuning, QMIX attains extraordinarily high win rates and achieves SOTA in the StarCraft Multi-Agent Challenge (SMAC). Furthermore, we find QMIX's monotonicity constraint improves sample efficiency in certain cooperative tasks. We propose a new policy-based algorithm to verify the importance of the monotonicity constraint: RIIT. RIIT successfully achieves SOTA in policy-based algorithms. Finally, we prove that the Purely Cooperative Tasks can be represented by the monotonic mixing networks. We open-source the code at \url{https://github.com/hijkzzz/pymarl2}.
View on arXiv