ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1612.01205
  4. Cited By
Optimal and Adaptive Off-policy Evaluation in Contextual Bandits

Optimal and Adaptive Off-policy Evaluation in Contextual Bandits

4 December 2016
Yu Wang
Alekh Agarwal
Miroslav Dudík
    OffRL
ArXivPDFHTML

Papers citing "Optimal and Adaptive Off-policy Evaluation in Contextual Bandits"

50 / 51 papers shown
Title
DOLCE: Decomposing Off-Policy Evaluation/Learning into Lagged and Current Effects
DOLCE: Decomposing Off-Policy Evaluation/Learning into Lagged and Current Effects
Shu Tamano
Masanori Nojima
OffRL
42
0
0
02 May 2025
Cross-Validated Off-Policy Evaluation
Cross-Validated Off-Policy Evaluation
Matej Cief
Branislav Kveton
Michal Kompan
OffRL
33
1
0
24 May 2024
Hyperparameter Optimization Can Even be Harmful in Off-Policy Learning
  and How to Deal with It
Hyperparameter Optimization Can Even be Harmful in Off-Policy Learning and How to Deal with It
Yuta Saito
Masahiro Nomura
OffRL
55
2
0
23 Apr 2024
Bayesian Off-Policy Evaluation and Learning for Large Action Spaces
Bayesian Off-Policy Evaluation and Learning for Large Action Spaces
Imad Aouali
Victor-Emmanuel Brunel
David Rohde
Anna Korba
OffRL
41
5
0
22 Feb 2024
Off-Policy Evaluation of Slate Bandit Policies via Optimizing
  Abstraction
Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction
Haruka Kiyohara
Masahiro Nomura
Yuta Saito
27
5
0
03 Feb 2024
Individualized Policy Evaluation and Learning under Clustered Network Interference
Individualized Policy Evaluation and Learning under Clustered Network Interference
Yi Zhang
Kosuke Imai
OffRL
42
1
0
04 Nov 2023
Distributional Off-Policy Evaluation for Slate Recommendations
Distributional Off-Policy Evaluation for Slate Recommendations
Shreyas Chaudhari
David Arbour
Georgios Theocharous
N. Vlassis
OffRL
46
0
0
27 Aug 2023
Online learning in bandits with predicted context
Online learning in bandits with predicted context
Yongyi Guo
Ziping Xu
Susan Murphy
26
4
0
26 Jul 2023
Balanced Off-Policy Evaluation for Personalized Pricing
Balanced Off-Policy Evaluation for Personalized Pricing
Adam N. Elmachtoub
Vishal Gupta
Yunfan Zhao
OffRL
42
6
0
24 Feb 2023
SPEED: Experimental Design for Policy Evaluation in Linear
  Heteroscedastic Bandits
SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits
Subhojyoti Mukherjee
Qiaomin Xie
Josiah P. Hanna
R. Nowak
OffRL
58
5
0
29 Jan 2023
Kernel-based off-policy estimation without overlap: Instance optimality
  beyond semiparametric efficiency
Kernel-based off-policy estimation without overlap: Instance optimality beyond semiparametric efficiency
Wenlong Mou
Peng Ding
Martin J. Wainwright
Peter L. Bartlett
OffRL
40
10
0
16 Jan 2023
A Review of Off-Policy Evaluation in Reinforcement Learning
A Review of Off-Policy Evaluation in Reinforcement Learning
Masatoshi Uehara
C. Shi
Nathan Kallus
OffRL
43
69
0
13 Dec 2022
Bayesian Counterfactual Mean Embeddings and Off-Policy Evaluation
Bayesian Counterfactual Mean Embeddings and Off-Policy Evaluation
Diego Martinez-Taboada
Dino Sejdinovic
CML
OffRL
27
0
0
02 Nov 2022
Deploying a Steered Query Optimizer in Production at Microsoft
Deploying a Steered Query Optimizer in Production at Microsoft
Wangda Zhang
Matteo Interlandi
Paul Mineiro
S. Qiao
Nasim Ghazanfari
Marc T. Friedman
Rafah Hosn
Hiren Patel
Alekh Jindal
28
23
0
24 Oct 2022
Off-policy estimation of linear functionals: Non-asymptotic theory for
  semi-parametric efficiency
Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency
Wenlong Mou
Martin J. Wainwright
Peter L. Bartlett
OffRL
43
11
0
26 Sep 2022
Fast Offline Policy Optimization for Large Scale Recommendation
Fast Offline Policy Optimization for Large Scale Recommendation
Otmane Sakhi
D. Rohde
Alexandre Gilotte
OffRL
50
3
0
08 Aug 2022
Uncertainty Quantification for Fairness in Two-Stage Recommender Systems
Uncertainty Quantification for Fairness in Two-Stage Recommender Systems
Lequn Wang
Thorsten Joachims
30
22
0
30 May 2022
Safe Exploration for Efficient Policy Evaluation and Comparison
Safe Exploration for Efficient Policy Evaluation and Comparison
Runzhe Wan
Branislav Kveton
Rui Song
OffRL
38
10
0
26 Feb 2022
Off-Policy Evaluation Using Information Borrowing and Context-Based
  Switching
Off-Policy Evaluation Using Information Borrowing and Context-Based Switching
Sutanoy Dasgupta
Yabo Niu
Kishan Panaganti
D. Kalathil
D. Pati
Bani Mallick
OffRL
31
0
0
18 Dec 2021
Safe Data Collection for Offline and Online Policy Learning
Safe Data Collection for Offline and Online Policy Learning
Ruihao Zhu
Branislav Kveton
OffRL
21
5
0
08 Nov 2021
Off-Policy Evaluation in Partially Observed Markov Decision Processes
  under Sequential Ignorability
Off-Policy Evaluation in Partially Observed Markov Decision Processes under Sequential Ignorability
Yupeng Tang
Seung-seob Lee
OffRL
59
22
0
24 Oct 2021
Continual Learning for Grounded Instruction Generation by Observing
  Human Following Behavior
Continual Learning for Grounded Instruction Generation by Observing Human Following Behavior
Noriyuki Kojima
Alane Suhr
Yoav Artzi
30
24
0
10 Aug 2021
Bandit Algorithms for Precision Medicine
Bandit Algorithms for Precision Medicine
Yangyi Lu
Ziping Xu
Ambuj Tewari
66
11
0
10 Aug 2021
Evaluating the progress of Deep Reinforcement Learning in the real
  world: aligning domain-agnostic and domain-specific research
Evaluating the progress of Deep Reinforcement Learning in the real world: aligning domain-agnostic and domain-specific research
J. Luis
E. Crawley
B. Cameron
OffRL
27
6
0
07 Jul 2021
Optimal Uniform OPE and Model-based Offline Reinforcement Learning in
  Time-Homogeneous, Reward-Free and Task-Agnostic Settings
Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings
Ming Yin
Yu Wang
OffRL
36
19
0
13 May 2021
Policy Learning with Adaptively Collected Data
Policy Learning with Adaptively Collected Data
Ruohan Zhan
Zhimei Ren
Susan Athey
Zhengyuan Zhou
OffRL
45
27
0
05 May 2021
Off-Policy Risk Assessment in Contextual Bandits
Off-Policy Risk Assessment in Contextual Bandits
Audrey Huang
Liu Leqi
Zachary Chase Lipton
Kamyar Azizzadenesheli
OffRL
32
36
0
18 Apr 2021
Benchmarks for Deep Off-Policy Evaluation
Benchmarks for Deep Off-Policy Evaluation
Justin Fu
Mohammad Norouzi
Ofir Nachum
George Tucker
Ziyun Wang
...
Yutian Chen
Aviral Kumar
Cosmin Paduraru
Sergey Levine
T. Paine
ELM
OffRL
35
100
0
30 Mar 2021
Instabilities of Offline RL with Pre-Trained Neural Representation
Instabilities of Offline RL with Pre-Trained Neural Representation
Ruosong Wang
Yifan Wu
Ruslan Salakhutdinov
Sham Kakade
OffRL
22
42
0
08 Mar 2021
Estimating Average Treatment Effects via Orthogonal Regularization
Estimating Average Treatment Effects via Orthogonal Regularization
Tobias Hatt
Stefan Feuerriegel
CML
164
35
0
21 Jan 2021
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible
  Off-Policy Evaluation
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation
Yuta Saito
Shunsuke Aihara
Megumi Matsutani
Yusuke Narita
OffRL
24
73
0
17 Aug 2020
Near-Optimal Provable Uniform Convergence in Offline Policy Evaluation
  for Reinforcement Learning
Near-Optimal Provable Uniform Convergence in Offline Policy Evaluation for Reinforcement Learning
Ming Yin
Yu Bai
Yu Wang
OffRL
44
31
0
07 Jul 2020
Off-policy Bandits with Deficient Support
Off-policy Bandits with Deficient Support
Noveen Sachdeva
Yi-Hsun Su
Thorsten Joachims
OffRL
38
75
0
16 Jun 2020
Batch Stationary Distribution Estimation
Batch Stationary Distribution Estimation
Junfeng Wen
Bo Dai
Lihong Li
Dale Schuurmans
OffRL
22
22
0
02 Mar 2020
Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement
  Learning
Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning
Ming Yin
Yu Wang
OffRL
29
80
0
29 Jan 2020
Empirical Study of Off-Policy Policy Evaluation for Reinforcement
  Learning
Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning
Cameron Voloshin
Hoang Minh Le
Nan Jiang
Yisong Yue
OffRL
32
152
0
15 Nov 2019
Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation
Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation
Ziyang Tang
Yihao Feng
Lihong Li
Dengyong Zhou
Qiang Liu
OffRL
30
67
0
16 Oct 2019
Infinite-horizon Off-Policy Policy Evaluation with Multiple Behavior
  Policies
Infinite-horizon Off-Policy Policy Evaluation with Multiple Behavior Policies
Xinyun Chen
Lu Wang
Yizhe Hang
Heng Ge
H. Zha
OffRL
14
5
0
10 Oct 2019
Double Reinforcement Learning for Efficient Off-Policy Evaluation in
  Markov Decision Processes
Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes
Nathan Kallus
Masatoshi Uehara
OffRL
41
183
0
22 Aug 2019
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary
  Distribution Corrections
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections
Ofir Nachum
Yinlam Chow
Bo Dai
Lihong Li
OffRL
13
328
0
10 Jun 2019
Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for
  Reinforcement Learning
Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning
Nathan Kallus
Masatoshi Uehara
OffRL
22
54
0
09 Jun 2019
Balanced off-policy evaluation in general action spaces
Balanced off-policy evaluation in general action spaces
A. Sondhi
David Arbour
Drew Dimmery
OffRL
29
17
0
09 Jun 2019
Empirical Likelihood for Contextual Bandits
Empirical Likelihood for Contextual Bandits
Nikos Karampatziakis
John Langford
Paul Mineiro
OffRL
23
9
0
07 Jun 2019
Imitation-Regularized Offline Learning
Imitation-Regularized Offline Learning
Yifei Ma
Yu Wang
Balakrishnan
Balakrishnan Narayanaswamy
OffRL
14
22
0
15 Jan 2019
CAB: Continuous Adaptive Blending Estimator for Policy Evaluation and
  Learning
CAB: Continuous Adaptive Blending Estimator for Policy Evaluation and Learning
Yi-Hsun Su
Lequn Wang
Michele Santacatterina
Mohsen Guizani
CML
OffRL
12
6
0
06 Nov 2018
Confounding-Robust Policy Improvement
Confounding-Robust Policy Improvement
Nathan Kallus
Angela Zhou
CML
OffRL
40
152
0
22 May 2018
The Mirage of Action-Dependent Baselines in Reinforcement Learning
The Mirage of Action-Dependent Baselines in Reinforcement Learning
George Tucker
Surya Bhupatiraju
S. Gu
Richard Turner
Zoubin Ghahramani
Sergey Levine
OffRL
30
126
0
27 Feb 2018
Active Learning with Logged Data
Active Learning with Logged Data
Songbai Yan
Kamalika Chaudhuri
T. Javidi
26
27
0
25 Feb 2018
Policy Evaluation and Optimization with Continuous Treatments
Policy Evaluation and Optimization with Continuous Treatments
Nathan Kallus
Angela Zhou
OffRL
11
132
0
16 Feb 2018
Estimation Considerations in Contextual Bandits
Estimation Considerations in Contextual Bandits
Maria Dimakopoulou
Zhengyuan Zhou
Susan Athey
Guido Imbens
32
69
0
19 Nov 2017
12
Next