ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.21304
49
0

Clustering Context in Off-Policy Evaluation

28 February 2025
Daniel Guzman-Olivares
Philipp Schmidt
Jacek Golebiowski
Artur Bekasov
    CML
    OffRL
ArXivPDFHTML
Abstract

Off-policy evaluation can leverage logged data to estimate the effectiveness of new policies in e-commerce, search engines, media streaming services, or automatic diagnostic tools in healthcare. However, the performance of baseline off-policy estimators like IPS deteriorates when the logging policy significantly differs from the evaluation policy. Recent work proposes sharing information across similar actions to mitigate this problem. In this work, we propose an alternative estimator that shares information across similar contexts using clustering. We study the theoretical properties of the proposed estimator, characterizing its bias and variance under different conditions. We also compare the performance of the proposed estimator and existing approaches in various synthetic problems, as well as a real-world recommendation dataset. Our experimental results confirm that clustering contexts improves estimation accuracy, especially in deficient information settings.

View on arXiv
@article{guzman-olivares2025_2502.21304,
  title={ Clustering Context in Off-Policy Evaluation },
  author={ Daniel Guzman-Olivares and Philipp Schmidt and Jacek Golebiowski and Artur Bekasov },
  journal={arXiv preprint arXiv:2502.21304},
  year={ 2025 }
}
Comments on this paper