Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.05736
Cited By
Optimal Baseline Corrections for Off-Policy Contextual Bandits
9 May 2024
Shashank Gupta
Olivier Jeunen
Harrie Oosterhuis
Maarten de Rijke
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Optimal Baseline Corrections for Off-Policy Contextual Bandits"
8 / 8 papers shown
Title
Counterfactual Inference under Thompson Sampling
Olivier Jeunen
OffRL
LRM
37
0
0
03 Apr 2025
A Simple and Effective Reinforcement Learning Method for Text-to-Image Diffusion Fine-tuning
Shashank Gupta
Chaitanya Ahuja
Tsung-Yu Lin
Sreya Dutta Roy
Harrie Oosterhuis
Maarten de Rijke
Satya Narayan Shukla
46
1
0
02 Mar 2025
Proximal Ranking Policy Optimization for Practical Safety in Counterfactual Learning to Rank
Shashank Gupta
Harrie Oosterhuis
Maarten de Rijke
OffRL
32
0
0
15 Sep 2024
A Simpler Alternative to Variational Regularized Counterfactual Risk Minimization
Hua Chang Bakker
Shashank Gupta
Harrie Oosterhuis
OffRL
28
0
0
15 Sep 2024
Practical and Robust Safety Guarantees for Advanced Counterfactual Learning to Rank
Shashank Gupta
Harrie Oosterhuis
Maarten de Rijke
43
6
0
29 Jul 2024
Multi-Objective Recommendation via Multivariate Policy Learning
Olivier Jeunen
Jatin Mandav
Ivan Potapov
Nakul Agarwal
Sourabh Vaid
Wenzhe Shi
Aleksei Ustimenko
OffRL
21
3
0
03 May 2024
Safe Deployment for Counterfactual Learning to Rank with Exposure-Based Risk Minimization
Shashank Gupta
Harrie Oosterhuis
Maarten de Rijke
32
14
0
26 Apr 2023
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
Sergey Levine
Aviral Kumar
George Tucker
Justin Fu
OffRL
GP
340
1,960
0
04 May 2020
1