ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1806.11500
11
30

Bayesian Counterfactual Risk Minimization

29 June 2018
Ben London
Ted Sandler
    OffRL
ArXivPDFHTML
Abstract

We present a Bayesian view of counterfactual risk minimization (CRM) for offline learning from logged bandit feedback. Using PAC-Bayesian analysis, we derive a new generalization bound for the truncated inverse propensity score estimator. We apply the bound to a class of Bayesian policies, which motivates a novel, potentially data-dependent, regularization technique for CRM. Experimental results indicate that this technique outperforms standard L2L_2L2​ regularization, and that it is competitive with variance regularization while being both simpler to implement and more computationally efficient.

View on arXiv
Comments on this paper