ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.22492
  4. Cited By
Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation

Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation

28 May 2025
Hongyi Zhou
Josiah P. Hanna
Jin Zhu
Ying Yang
Chengchun Shi
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation"

27 / 27 papers shown
Title
Doubly Optimal Policy Evaluation for Reinforcement Learning
Doubly Optimal Policy Evaluation for Reinforcement Learning
Shuze Liu
Claire Chen
Shangtong Zhang
OffRL
177
3
0
03 Oct 2024
Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning
Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning
Weidong Liu
Jiyuan Tu
Yichen Zhang
Xi Chen
OffRL
67
4
0
04 Oct 2023
Off-policy Evaluation in Doubly Inhomogeneous Environments
Off-policy Evaluation in Doubly Inhomogeneous Environments
Zeyu Bian
C. Shi
Zhengling Qi
Lan Wang
OffRL
60
7
0
14 Jun 2023
The Statistical Benefits of Quantile Temporal-Difference Learning for
  Value Estimation
The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation
Mark Rowland
Yunhao Tang
Clare Lyle
Rémi Munos
Marc G. Bellemare
Will Dabney
58
11
0
28 May 2023
An Instrumental Variable Approach to Confounded Off-Policy Evaluation
An Instrumental Variable Approach to Confounded Off-Policy Evaluation
Yang Xu
Jin Zhu
C. Shi
Shuang Luo
R. Song
OffRL
91
18
0
29 Dec 2022
A Review of Off-Policy Evaluation in Reinforcement Learning
A Review of Off-Policy Evaluation in Reinforcement Learning
Masatoshi Uehara
C. Shi
Nathan Kallus
OffRL
94
76
0
13 Dec 2022
Low Variance Off-policy Evaluation with State-based Importance Sampling
Low Variance Off-policy Evaluation with State-based Importance Sampling
David M. Bossens
Philip S. Thomas
OffRL
47
2
0
07 Dec 2022
ReVar: Strengthening Policy Evaluation via Reduced Variance Sampling
ReVar: Strengthening Policy Evaluation via Reduced Variance Sampling
Subhojyoti Mukherjee
Josiah P. Hanna
Robert D. Nowak
OffRL
63
15
0
09 Mar 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
880
13,148
0
04 Mar 2022
Off-Policy Confidence Interval Estimation with Confounded Markov
  Decision Process
Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process
C. Shi
Jin Zhu
Ye Shen
Shuang Luo
Hong Zhu
R. Song
OffRL
93
34
0
22 Feb 2022
On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function
  Estimation in Off-policy Evaluation
On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation
Xiaohong Chen
Zhengling Qi
OffRL
74
35
0
17 Jan 2022
A Minimax Learning Approach to Off-Policy Evaluation in Confounded
  Partially Observable Markov Decision Processes
A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes
C. Shi
Masatoshi Uehara
Jiawei Huang
Nan Jiang
OffRL
65
26
0
12 Nov 2021
Off-Policy Evaluation in Partially Observed Markov Decision Processes
  under Sequential Ignorability
Off-Policy Evaluation in Partially Observed Markov Decision Processes under Sequential Ignorability
Yupeng Tang
Seung-seob Lee
OffRL
102
26
0
24 Oct 2021
Deeply-Debiased Off-Policy Interval Estimation
Deeply-Debiased Off-Policy Interval Estimation
C. Shi
Runzhe Wan
Victor Chernozhukov
R. Song
OffRL
45
38
0
10 May 2021
Bootstrapping Fitted Q-Evaluation for Off-Policy Inference
Bootstrapping Fitted Q-Evaluation for Off-Policy Inference
Botao Hao
X. Ji
Yaqi Duan
Hao Lu
Csaba Szepesvári
Mengdi Wang
OffRL
46
40
0
06 Feb 2021
Off-policy Policy Evaluation For Sequential Decisions Under Unobserved
  Confounding
Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding
Hongseok Namkoong
Ramtin Keramati
Steve Yadlowsky
Emma Brunskill
OffRL
154
65
0
12 Mar 2020
Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement
  Learning
Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning
Nathan Kallus
Angela Zhou
OffRL
85
60
0
11 Feb 2020
More Efficient Off-Policy Evaluation through Regularized Targeted
  Learning
More Efficient Off-Policy Evaluation through Regularized Targeted Learning
Aurélien F. Bibaut
Ivana Malenica
N. Vlassis
Mark van der Laan
OODOffRL
42
41
0
13 Dec 2019
Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation
Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation
Ziyang Tang
Yihao Feng
Lihong Li
Dengyong Zhou
Qiang Liu
OffRL
148
69
0
16 Oct 2019
Double Reinforcement Learning for Efficient Off-Policy Evaluation in
  Markov Decision Processes
Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes
Nathan Kallus
Masatoshi Uehara
OffRL
92
186
0
22 Aug 2019
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary
  Distribution Corrections
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections
Ofir Nachum
Yinlam Chow
Bo Dai
Lihong Li
OffRL
151
337
0
10 Jun 2019
Towards Optimal Off-Policy Evaluation for Reinforcement Learning with
  Marginalized Importance Sampling
Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling
Tengyang Xie
Yifei Ma
Yu Wang
OffRL
97
181
0
08 Jun 2019
Combining Parametric and Nonparametric Models for Off-Policy Evaluation
Combining Parametric and Nonparametric Models for Off-Policy Evaluation
Omer Gottesman
Yao Liu
Scott Sussex
Emma Brunskill
Finale Doshi-Velez
OffRL
73
36
0
14 May 2019
Information-Theoretic Considerations in Batch Reinforcement Learning
Information-Theoretic Considerations in Batch Reinforcement Learning
Jinglin Chen
Nan Jiang
OODOffRL
161
378
0
01 May 2019
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
Qiang Liu
Lihong Li
Ziyang Tang
Dengyong Zhou
OffRL
158
356
0
29 Oct 2018
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
526
19,237
0
20 Jul 2017
Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
Philip S. Thomas
Emma Brunskill
OffRL
432
576
0
04 Apr 2016
1