Towards Robust Offline Evaluation: A Causal and Information Theoretic Framework for Debiasing Ranking Systems

4 April 2025

Abstract

Evaluating retrieval-ranking systems is crucial for developing high-performing models. While online A/B testing is the gold standard, its high cost and risks to user experience require effective offline methods. However, relying on historical interaction data introduces biases-such as selection, exposure, conformity, and position biases-that distort evaluation metrics, driven by the Missing-Not-At-Random (MNAR) nature of user interactions and favoring popular or frequently exposed items over true user preferences.We propose a novel framework for robust offline evaluation of retrieval-ranking systems, transforming MNAR data into Missing-At-Random (MAR) through reweighting combined with black-box optimization, guided by neural estimation of information-theoretic metrics. Our contributions include (1) a causal formulation for addressing offline evaluation biases, (2) a system-agnostic debiasing framework, and (3) empirical validation of its effectiveness. This framework enables more accurate, fair, and generalizable evaluations, enhancing model assessment before deployment.

View on arXiv

@article{khatami2025_2504.03997,
  title={ Towards Robust Offline Evaluation: A Causal and Information Theoretic Framework for Debiasing Ranking Systems },
  author={ Seyedeh Baharan Khatami and Sayan Chakraborty and Ruomeng Xu and Babak Salimi },
  journal={arXiv preprint arXiv:2504.03997},
  year={ 2025 }
}

Comments on this paper