53
0
v1v2 (latest)

Ensemble-MIX: Enhancing Sample Efficiency in Multi-Agent RL Using Ensemble Methods

Main:9 Pages
6 Figures
Bibliography:3 Pages
1 Tables
Appendix:6 Pages
Abstract

Multi-agent reinforcement learning (MARL) methods have achieved state-of-the-art results on a range of multi-agent tasks. Yet, MARL algorithms typically require significantly more environment interactions than their single-agent counterparts to converge, a problem exacerbated by the difficulty in exploring over a large joint action space and the high variance intrinsic to MARL environments. To tackle these issues, we propose a novel algorithm that combines a decomposed centralized critic with decentralized ensemble learning, incorporating several key contributions. The main component in our scheme is a selective exploration method that leverages ensemble kurtosis. We extend the global decomposed critic with a diversity-regularized ensemble of individual critics and utilize its excess kurtosis to guide exploration toward high-uncertainty states and actions. To improve sample efficiency, we train the centralized critic with a novel truncated variation of the TD(λ\lambda) algorithm, enabling efficient off-policy learning with reduced variance. On the actor side, our suggested algorithm adapts the mixed samples approach to MARL, mixing on-policy and off-policy loss functions for training the actors. This approach balances between stability and efficiency and outperforms purely off-policy learning. The evaluation shows our method outperforms state-of-the-art baselines on standard MARL benchmarks, including a variety of SMAC II maps.

View on arXiv
@article{danino2025_2506.02841,
  title={ Ensemble-MIX: Enhancing Sample Efficiency in Multi-Agent RL Using Ensemble Methods },
  author={ Tom Danino and Nahum Shimkin },
  journal={arXiv preprint arXiv:2506.02841},
  year={ 2025 }
}
Comments on this paper