v1v2 (latest)

Three Methods for Training on Bandit Feedback

24 April 2019

Papers citing "Three Methods for Training on Bandit Feedback"

2 / 2 papers shown

Title
RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising D. Rohde Stephen Bonner Travis Dunlop Flavian Vasile Alexandros Karatzoglou OffRL 57 150 0 02 Aug 2018
The Offset Tree for Learning with Partial Labels A. Beygelzimer John Langford 317 185 0 21 Dec 2008