Ensemble-MIX: Enhancing Sample Efficiency in Multi-Agent RL Using Ensemble Methods

3 June 2025

Tom Danino

Nahum Shimkin

ArXiv (abs)PDF HTML

Main:9 Pages

6 Figures

Bibliography:3 Pages

1 Tables

Appendix:6 Pages

Abstract

Multi-agent reinforcement learning (MARL) methods have achieved state-of-the-art results on a range of multi-agent tasks. Yet, MARL algorithms typically require significantly more environment interactions than their single-agent counterparts to converge, a problem exacerbated by the difficulty in exploring over a large joint action space and the high variance intrinsic to MARL environments. To tackle these issues, we propose a novel algorithm that combines a decomposed centralized critic with decentralized ensemble learning, incorporating several key contributions. The main component in our scheme is a selective exploration method that leverages ensemble kurtosis. We extend the global decomposed critic with a diversity-regularized ensemble of individual critics and utilize its excess kurtosis to guide exploration toward high-uncertainty states and actions. To improve sample efficiency, we train the centralized critic with a novel truncated variation of the TD( $\lambda$ ) algorithm, enabling efficient off-policy learning with reduced variance. On the actor side, our suggested algorithm adapts the mixed samples approach to MARL, mixing on-policy and off-policy loss functions for training the actors. This approach balances between stability and efficiency and outperforms purely off-policy learning. The evaluation shows our method outperforms state-of-the-art baselines on standard MARL benchmarks, including a variety of SMAC II maps.

View on arXiv

@article{danino2025_2506.02841,
  title={ Ensemble-MIX: Enhancing Sample Efficiency in Multi-Agent RL Using Ensemble Methods },
  author={ Tom Danino and Nahum Shimkin },
  journal={arXiv preprint arXiv:2506.02841},
  year={ 2025 }
}

Comments on this paper