Learning from eXtreme Bandit Feedback

Learning from eXtreme Bandit Feedback

27 September 2020

Inderjit S. Dhillon

Michael I. Jordan

Papers citing "Learning from eXtreme Bandit Feedback"

6 / 6 papers shown

Title
On (Normalised) Discounted Cumulative Gain as an Off-Policy Evaluation Metric for Top- $n$ Recommendation Olivier Jeunen Ivan Potapov Aleksei Ustimenko ELM OffRL 27 11 0 27 Jul 2023
Constrained Policy Optimization for Controlled Self-Learning in Conversational AI Systems Mohammad Kachuee Sungjin Lee 73 4 0 17 Sep 2022
Supporting Massive DLRM Inference Through Software Defined Memory E. K. Ardestani Changkyu Kim Seung Jae Lee Luoshang Pan Valmiki Rampersad ... Krishnakumar Nair Maxim Naumov Christopher Peterson M. Smelyanskiy Vijay Rao BDL 33 20 0 21 Oct 2021
On component interactions in two-stage recommender systems Jiri Hron K. Krauth Michael I. Jordan Niki Kilbertus CML LRM 40 31 0 28 Jun 2021
Learning Representations for Counterfactual Inference Fredrik D. Johansson Uri Shalit David Sontag CML OOD BDL 232 719 0 12 May 2016
Off-Policy Actor-Critic T. Degris Martha White R. Sutton OffRL CML 163 220 0 22 May 2012