Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.22492
Cited By
Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation
28 May 2025
Hongyi Zhou
Josiah P. Hanna
Jin Zhu
Ying Yang
Chengchun Shi
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation"
27 / 27 papers shown
Title
Doubly Optimal Policy Evaluation for Reinforcement Learning
Shuze Liu
Claire Chen
Shangtong Zhang
OffRL
177
3
0
03 Oct 2024
Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning
Weidong Liu
Jiyuan Tu
Yichen Zhang
Xi Chen
OffRL
67
4
0
04 Oct 2023
Off-policy Evaluation in Doubly Inhomogeneous Environments
Zeyu Bian
C. Shi
Zhengling Qi
Lan Wang
OffRL
60
7
0
14 Jun 2023
The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation
Mark Rowland
Yunhao Tang
Clare Lyle
Rémi Munos
Marc G. Bellemare
Will Dabney
55
11
0
28 May 2023
An Instrumental Variable Approach to Confounded Off-Policy Evaluation
Yang Xu
Jin Zhu
C. Shi
Shuang Luo
R. Song
OffRL
91
18
0
29 Dec 2022
A Review of Off-Policy Evaluation in Reinforcement Learning
Masatoshi Uehara
C. Shi
Nathan Kallus
OffRL
92
76
0
13 Dec 2022
Low Variance Off-policy Evaluation with State-based Importance Sampling
David M. Bossens
Philip S. Thomas
OffRL
47
2
0
07 Dec 2022
ReVar: Strengthening Policy Evaluation via Reduced Variance Sampling
Subhojyoti Mukherjee
Josiah P. Hanna
Robert D. Nowak
OffRL
63
15
0
09 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
880
13,148
0
04 Mar 2022
Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process
C. Shi
Jin Zhu
Ye Shen
Shuang Luo
Hong Zhu
R. Song
OffRL
93
34
0
22 Feb 2022
On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation
Xiaohong Chen
Zhengling Qi
OffRL
74
35
0
17 Jan 2022
A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes
C. Shi
Masatoshi Uehara
Jiawei Huang
Nan Jiang
OffRL
65
26
0
12 Nov 2021
Off-Policy Evaluation in Partially Observed Markov Decision Processes under Sequential Ignorability
Yupeng Tang
Seung-seob Lee
OffRL
102
26
0
24 Oct 2021
Deeply-Debiased Off-Policy Interval Estimation
C. Shi
Runzhe Wan
Victor Chernozhukov
R. Song
OffRL
45
38
0
10 May 2021
Bootstrapping Fitted Q-Evaluation for Off-Policy Inference
Botao Hao
X. Ji
Yaqi Duan
Hao Lu
Csaba Szepesvári
Mengdi Wang
OffRL
46
40
0
06 Feb 2021
Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding
Hongseok Namkoong
Ramtin Keramati
Steve Yadlowsky
Emma Brunskill
OffRL
154
65
0
12 Mar 2020
Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning
Nathan Kallus
Angela Zhou
OffRL
82
60
0
11 Feb 2020
More Efficient Off-Policy Evaluation through Regularized Targeted Learning
Aurélien F. Bibaut
Ivana Malenica
N. Vlassis
Mark van der Laan
OOD
OffRL
42
41
0
13 Dec 2019
Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation
Ziyang Tang
Yihao Feng
Lihong Li
Dengyong Zhou
Qiang Liu
OffRL
148
69
0
16 Oct 2019
Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes
Nathan Kallus
Masatoshi Uehara
OffRL
89
186
0
22 Aug 2019
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections
Ofir Nachum
Yinlam Chow
Bo Dai
Lihong Li
OffRL
151
337
0
10 Jun 2019
Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling
Tengyang Xie
Yifei Ma
Yu Wang
OffRL
97
181
0
08 Jun 2019
Combining Parametric and Nonparametric Models for Off-Policy Evaluation
Omer Gottesman
Yao Liu
Scott Sussex
Emma Brunskill
Finale Doshi-Velez
OffRL
70
36
0
14 May 2019
Information-Theoretic Considerations in Batch Reinforcement Learning
Jinglin Chen
Nan Jiang
OOD
OffRL
161
378
0
01 May 2019
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
Qiang Liu
Lihong Li
Ziyang Tang
Dengyong Zhou
OffRL
158
356
0
29 Oct 2018
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
526
19,237
0
20 Jul 2017
Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
Philip S. Thomas
Emma Brunskill
OffRL
432
576
0
04 Apr 2016
1