Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1910.12809
Cited By
Minimax Weight and Q-Function Learning for Off-Policy Evaluation
28 October 2019
Masatoshi Uehara
Jiawei Huang
Nan Jiang
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Minimax Weight and Q-Function Learning for Off-Policy Evaluation"
50 / 56 papers shown
Title
Reinforcement Learning with Continuous Actions Under Unmeasured Confounding
Yuhan Li
Eugene Han
Yifan Hu
Wenzhuo Zhou
Zhengling Qi
Yifan Cui
Ruoqing Zhu
OffRL
150
0
0
01 May 2025
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Jiawei Huang
Bingcong Li
Christoph Dann
Niao He
OffRL
85
0
0
26 Feb 2025
Statistical Inference in Reinforcement Learning: A Selective Survey
Chengchun Shi
OffRL
69
0
0
22 Feb 2025
Spatially Randomized Designs Can Enhance Policy Evaluation
Ying Yang
Chengchun Shi
Fang Yao
Shouyang Wang
Hongtu Zhu
OffRL
41
0
0
18 Mar 2024
On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation
Yuheng Zhang
Nan Jiang
OffRL
29
4
0
22 Feb 2024
MoMA: Model-based Mirror Ascent for Offline Reinforcement Learning
Mao Hong
Zhiyue Zhang
Yue Wu
Yan Xu
OffRL
48
0
0
21 Jan 2024
Neural Network Approximation for Pessimistic Offline Reinforcement Learning
Di Wu
Yuling Jiao
Li Shen
Haizhao Yang
Xiliang Lu
OffRL
29
1
0
19 Dec 2023
Stackelberg Batch Policy Learning
Wenzhuo Zhou
Annie Qu
OffRL
35
0
0
28 Sep 2023
The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation
P. Amortila
Nan Jiang
Csaba Szepesvári
OffRL
29
3
0
25 Jul 2023
Offline Primal-Dual Reinforcement Learning for Linear MDPs
Germano Gabbianelli
Gergely Neu
Nneka Okolo
Matteo Papini
OffRL
29
7
0
22 May 2023
Distributional Offline Policy Evaluation with Predictive Error Guarantees
Runzhe Wu
Masatoshi Uehara
Wen Sun
OffRL
38
13
0
19 Feb 2023
HOPE: Human-Centric Off-Policy Evaluation for E-Learning and Healthcare
Ge Gao
Song Ju
Markel Sanz Ausin
Min Chi
OffRL
29
8
0
18 Feb 2023
Minimax Instrumental Variable Regression and
L
2
L_2
L
2
Convergence Guarantees without Identification or Closedness
Andrew Bennett
Nathan Kallus
Xiaojie Mao
Whitney Newey
Vasilis Syrgkanis
Masatoshi Uehara
33
14
0
10 Feb 2023
Provably Efficient Offline Goal-Conditioned Reinforcement Learning with General Function Approximation and Single-Policy Concentrability
Hanlin Zhu
Amy Zhang
OffRL
24
2
0
07 Feb 2023
A Reinforcement Learning Framework for Dynamic Mediation Analysis
Linjuan Ge
Jitao Wang
C. Shi
Zhanghua Wu
Rui Song
29
5
0
31 Jan 2023
Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning
Hanlin Zhu
Paria Rashidinejad
Jiantao Jiao
OffRL
38
15
0
30 Jan 2023
Off-Policy Evaluation for Action-Dependent Non-Stationary Environments
Yash Chandak
Shiv Shankar
Nathaniel D. Bastian
Bruno Castro da Silva
Emma Brunskil
Philip S. Thomas
OffRL
44
6
0
24 Jan 2023
Offline Policy Optimization in RL with Variance Regularizaton
Riashat Islam
Samarth Sinha
Homanga Bharadhwaj
Samin Yeasar Arnob
Zhuoran Yang
Animesh Garg
Zhaoran Wang
Lihong Li
Doina Precup
OffRL
26
0
0
29 Dec 2022
Policy learning "without'' overlap: Pessimism and generalized empirical Bernstein's inequality
Ying Jin
Zhimei Ren
Zhuoran Yang
Zhaoran Wang
OffRL
32
25
0
19 Dec 2022
When is Realizability Sufficient for Off-Policy Reinforcement Learning?
Andrea Zanette
OffRL
16
14
0
10 Nov 2022
Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions
Audrey Huang
Nan Jiang
OffRL
51
9
0
27 Oct 2022
Efficient Global Planning in Large MDPs via Stochastic Primal-Dual Optimization
Gergely Neu
Nneka Okolo
32
6
0
21 Oct 2022
Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL
Fengzhuo Zhang
Boyi Liu
Kaixin Wang
Vincent Y. F. Tan
Zhuoran Yang
Zhaoran Wang
OffRL
LRM
51
10
0
20 Sep 2022
Offline Stochastic Shortest Path: Learning, Evaluation and Towards Optimality
Ming Yin
Wenjing Chen
Mengdi Wang
Yu-Xiang Wang
OffRL
30
4
0
10 Jun 2022
Model-Free and Model-Based Policy Evaluation when Causality is Uncertain
David Bruns-Smith
CML
ELM
OffRL
24
12
0
02 Apr 2022
Offline Reinforcement Learning Under Value and Density-Ratio Realizability: The Power of Gaps
Jinglin Chen
Nan Jiang
OffRL
21
33
0
25 Mar 2022
Review of Metrics to Measure the Stability, Robustness and Resilience of Reinforcement Learning
L. Pullum
13
2
0
22 Mar 2022
DARA: Dynamics-Aware Reward Augmentation in Offline Reinforcement Learning
Jinxin Liu
Hongyin Zhang
Donglin Wang
OffRL
38
32
0
13 Mar 2022
A Complete Characterization of Linear Estimators for Offline Policy Evaluation
Juan C. Perdomo
A. Krishnamurthy
Peter L. Bartlett
Sham Kakade
OffRL
27
3
0
08 Mar 2022
Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process
C. Shi
Jin Zhu
Ye Shen
Shuang Luo
Hong Zhu
R. Song
OffRL
25
30
0
22 Feb 2022
A Statistical Analysis of Polyak-Ruppert Averaged Q-learning
Xiang Li
Wenhao Yang
Jiadong Liang
Zhihua Zhang
Michael I. Jordan
40
15
0
29 Dec 2021
Pessimistic Model Selection for Offline Deep Reinforcement Learning
Chao-Han Huck Yang
Zhengling Qi
Yifan Cui
Pin-Yu Chen
OffRL
24
4
0
29 Nov 2021
Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation
Dylan J. Foster
A. Krishnamurthy
D. Simchi-Levi
Yunzong Xu
OffRL
19
62
0
21 Nov 2021
SOPE: Spectrum of Off-Policy Estimators
C. J. Yuan
Yash Chandak
S. Giguere
Philip S. Thomas
S. Niekum
OffRL
50
5
0
06 Nov 2021
Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning
Wenzhuo Zhou
Ruoqing Zhu
Annie Qu
32
22
0
20 Oct 2021
Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
Andrea Zanette
Martin J. Wainwright
Emma Brunskill
OffRL
29
111
0
19 Aug 2021
Online Bootstrap Inference For Policy Evaluation in Reinforcement Learning
Pratik Ramprasad
Yuantong Li
Zhuoran Yang
Zhaoran Wang
W. Sun
Guang Cheng
OffRL
50
26
0
08 Aug 2021
Model Selection for Offline Reinforcement Learning: Practical Considerations for Healthcare Settings
Shengpu Tang
Jenna Wiens
OffRL
26
78
0
23 Jul 2021
On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio Correction
Jiawei Huang
Nan Jiang
11
5
0
02 Jun 2021
On Instrumental Variable Regression for Deep Offline Policy Evaluation
Yutian Chen
Liyuan Xu
Çağlar Gülçehre
T. Paine
Arthur Gretton
Nando de Freitas
Arnaud Doucet
OffRL
39
18
0
21 May 2021
Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings
Ming Yin
Yu-Xiang Wang
OffRL
32
19
0
13 May 2021
Universal Off-Policy Evaluation
Yash Chandak
S. Niekum
Bruno C. da Silva
Erik Learned-Miller
Emma Brunskill
Philip S. Thomas
OffRL
ELM
32
52
0
26 Apr 2021
Nearly Horizon-Free Offline Reinforcement Learning
Tongzheng Ren
Jialian Li
Bo Dai
S. Du
Sujay Sanghavi
OffRL
26
49
0
25 Mar 2021
Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning
Yaqi Duan
Chi Jin
Zhiyuan Li
OffRL
20
47
0
25 Mar 2021
Instabilities of Offline RL with Pre-Trained Neural Representation
Ruosong Wang
Yifan Wu
Ruslan Salakhutdinov
Sham Kakade
OffRL
20
42
0
08 Mar 2021
Is Pessimism Provably Efficient for Offline RL?
Ying Jin
Zhuoran Yang
Zhaoran Wang
OffRL
27
346
0
30 Dec 2020
CoinDICE: Off-Policy Confidence Interval Estimation
Bo Dai
Ofir Nachum
Yinlam Chow
Lihong Li
Csaba Szepesvári
Dale Schuurmans
OffRL
27
84
0
22 Oct 2020
Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy
Zuyue Fu
Zhuoran Yang
Zhaoran Wang
15
42
0
02 Aug 2020
Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders
Andrew Bennett
Nathan Kallus
Lihong Li
Ali Mousavi
OffRL
35
43
0
27 Jul 2020
Batch Policy Learning in Average Reward Markov Decision Processes
Peng Liao
Zhengling Qi
Runzhe Wan
P. Klasnja
S. Murphy
OffRL
34
81
0
23 Jul 2020
1
2
Next