Counterfactual Multi-player Bandits for Explainable Recommendation Diversification

Existing recommender systems tend to prioritize items closely aligned with users' historical interactions, inevitably trapping users in the dilemma of ``filter bubble''. Recent efforts are dedicated to improving the diversity of recommendations. However, they mainly suffer from two major issues: 1) a lack of explainability, making it difficult for the system designers to understand how diverse recommendations are generated, and 2) limitations to specific metrics, with difficulty in enhancing non-differentiable diversity metrics. To this end, we propose a \textbf{C}ounterfactual \textbf{M}ulti-player \textbf{B}andits (CMB) method to deliver explainable recommendation diversification across a wide range of diversity metrics. Leveraging a counterfactual framework, our method identifies the factors influencing diversity outcomes. Meanwhile, we adopt the multi-player bandits to optimize the counterfactual optimization objective, making it adaptable to both differentiable and non-differentiable diversity metrics. Extensive experiments conducted on three real-world datasets demonstrate the applicability, effectiveness, and explainability of the proposed CMB.
View on arXiv@article{zhang2025_2505.21165, title={ Counterfactual Multi-player Bandits for Explainable Recommendation Diversification }, author={ Yansen Zhang and Bowei He and Xiaokun Zhang and Haolun Wu and Zexu Sun and Chen Ma }, journal={arXiv preprint arXiv:2505.21165}, year={ 2025 } }