Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1604.00923
Cited By
Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
4 April 2016
Philip S. Thomas
Emma Brunskill
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning"
50 / 342 papers shown
Title
Dr Jekyll and Mr Hyde: the Strange Case of Off-Policy Policy Updates
Romain Laroche
Rémi Tachet des Combes
46
8
0
29 Sep 2021
A Spectral Approach to Off-Policy Evaluation for POMDPs
Yash Nair
Nan Jiang
OffRL
26
17
0
22 Sep 2021
Accelerating Offline Reinforcement Learning Application in Real-Time Bidding and Recommendation: Potential Use of Simulation
Haruka Kiyohara
K. Kawakami
Yuta Saito
OffRL
32
12
0
17 Sep 2021
State Relevance for Off-Policy Evaluation
S. Shen
Yecheng Ma
Omer Gottesman
Finale Doshi-Velez
OffRL
16
4
0
13 Sep 2021
Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
Andrea Zanette
Martin J. Wainwright
Emma Brunskill
OffRL
34
115
0
19 Aug 2021
Model Selection for Offline Reinforcement Learning: Practical Considerations for Healthcare Settings
Shengpu Tang
Jenna Wiens
OffRL
26
78
0
23 Jul 2021
Conservative Offline Distributional Reinforcement Learning
Yecheng Jason Ma
Dinesh Jayaraman
Osbert Bastani
OffRL
73
79
0
12 Jul 2021
Supervised Off-Policy Ranking
Yue Jin
Yue Zhang
Tao Qin
Xudong Zhang
Jian Yuan
Houqiang Li
Tie-Yan Liu
OffRL
37
5
0
03 Jul 2021
On component interactions in two-stage recommender systems
Jiri Hron
K. Krauth
Michael I. Jordan
Niki Kilbertus
CML
LRM
42
31
0
28 Jun 2021
Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation
Yunhao Tang
Tadashi Kozuno
Mark Rowland
Rémi Munos
Michal Valko
OffRL
27
9
0
24 Jun 2021
Variance-Aware Off-Policy Evaluation with Linear Function Approximation
Yifei Min
Tianhao Wang
Dongruo Zhou
Quanquan Gu
OffRL
42
38
0
22 Jun 2021
Control Variates for Slate Off-Policy Evaluation
N. Vlassis
Ashok Chandrashekar
Fernando Amat Gil
Nathan Kallus
OffRL
28
9
0
15 Jun 2021
A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation
Scott Fujimoto
David Meger
Doina Precup
10
16
0
12 Jun 2021
Recomposing the Reinforcement Learning Building Blocks with Hypernetworks
Shai Keynan
Elad Sarafian
Sarit Kraus
OffRL
23
29
0
12 Jun 2021
Robust Generalization despite Distribution Shift via Minimum Discriminating Information
Tobias Sutter
Andreas Krause
Daniel Kuhn
OOD
27
10
0
08 Jun 2021
Offline Policy Comparison under Limited Historical Agent-Environment Interactions
Anton Dereventsov
Joseph Daws
Clayton Webster
OffRL
34
3
0
07 Jun 2021
Post-Contextual-Bandit Inference
Aurélien F. Bibaut
Antoine Chambaz
Maria Dimakopoulou
Nathan Kallus
Mark van der Laan
32
39
0
01 Jun 2021
Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs
Harsh Satija
Philip S. Thomas
Joelle Pineau
Romain Laroche
OffRL
43
21
0
31 May 2021
A unified view of likelihood ratio and reparameterization gradients
Paavo Parmas
Masashi Sugiyama
28
9
0
31 May 2021
On Instrumental Variable Regression for Deep Offline Policy Evaluation
Yutian Chen
Liyuan Xu
Çağlar Gülçehre
T. Paine
Arthur Gretton
Nando de Freitas
Arnaud Doucet
OffRL
56
18
0
21 May 2021
Deeply-Debiased Off-Policy Interval Estimation
C. Shi
Runzhe Wan
Victor Chernozhukov
R. Song
OffRL
30
36
0
10 May 2021
Towards Theoretical Understandings of Robust Markov Decision Processes: Sample Complexity and Asymptotics
Wenhao Yang
Liangyu Zhang
Zhihua Zhang
28
33
0
09 May 2021
Statistical Inference with M-Estimators on Adaptively Collected Data
Kelly W. Zhang
Lucas Janson
Susan Murphy
OffRL
19
41
0
29 Apr 2021
Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization
Michael Ruogu Zhang
T. Paine
Ofir Nachum
Cosmin Paduraru
George Tucker
Ziyun Wang
Mohammad Norouzi
OffRL
30
45
0
28 Apr 2021
Universal Off-Policy Evaluation
Yash Chandak
S. Niekum
Bruno C. da Silva
Erik Learned-Miller
Emma Brunskill
Philip S. Thomas
OffRL
ELM
39
52
0
26 Apr 2021
Discovering an Aid Policy to Minimize Student Evasion Using Offline Reinforcement Learning
Leandro M. de Lima
R. Krohling
OffRL
45
4
0
20 Apr 2021
Off-Policy Risk Assessment in Contextual Bandits
Audrey Huang
Liu Leqi
Zachary Chase Lipton
Kamyar Azizzadenesheli
OffRL
32
36
0
18 Apr 2021
Benchmarks for Deep Off-Policy Evaluation
Justin Fu
Mohammad Norouzi
Ofir Nachum
George Tucker
Ziyun Wang
...
Yutian Chen
Aviral Kumar
Cosmin Paduraru
Sergey Levine
T. Paine
ELM
OffRL
35
100
0
30 Mar 2021
Learning Under Adversarial and Interventional Shifts
Harvineet Singh
Shalmali Joshi
Finale Doshi-Velez
Himabindu Lakkaraju
OOD
17
3
0
29 Mar 2021
Estimating the Long-Term Effects of Novel Treatments
Keith Battocchi
E. Dillon
Maggie Hei
Greg Lewis
M. Oprescu
Vasilis Syrgkanis
CML
22
10
0
15 Mar 2021
Learning robust driving policies without online exploration
D. Graves
Nhat M. Nguyen
Kimia Hassanzadeh
Jun Jin
Jun Luo
OffRL
14
2
0
15 Mar 2021
Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks
Thanh Nguyen-Tang
Sunil R. Gupta
Hung The Tran
Svetha Venkatesh
OffRL
70
7
0
11 Mar 2021
Causal-aware Safe Policy Improvement for Task-oriented dialogue
Govardana Sachithanandam Ramachandran
Kazuma Hashimoto
Caiming Xiong
OffRL
11
11
0
10 Mar 2021
Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and Dual Bounds
Yihao Feng
Ziyang Tang
Na Zhang
Qiang Liu
OffRL
17
14
0
09 Mar 2021
Instabilities of Offline RL with Pre-Trained Neural Representation
Ruosong Wang
Yifan Wu
Ruslan Salakhutdinov
Sham Kakade
OffRL
24
42
0
08 Mar 2021
Personalization for Web-based Services using Offline Reinforcement Learning
P. Apostolopoulos
Zehui Wang
Hanson Wang
Chad Zhou
Kittipat Virochsiri
Norm Zhou
Igor L. Markov
OffRL
OnRL
27
7
0
10 Feb 2021
Bootstrapping Fitted Q-Evaluation for Off-Policy Inference
Botao Hao
X. Ji
Yaqi Duan
Hao Lu
Csaba Szepesvári
Mengdi Wang
OffRL
11
37
0
06 Feb 2021
Fast Rates for the Regret of Offline Reinforcement Learning
Yichun Hu
Nathan Kallus
Masatoshi Uehara
OffRL
26
30
0
31 Jan 2021
High-Confidence Off-Policy (or Counterfactual) Variance Estimation
Yash Chandak
Shiv Shankar
Philip S. Thomas
OffRL
19
8
0
25 Jan 2021
Minimax Off-Policy Evaluation for Multi-Armed Bandits
Cong Ma
Banghua Zhu
Jiantao Jiao
Martin J. Wainwright
OffRL
16
10
0
19 Jan 2021
Off-Policy Evaluation of Slate Policies under Bayes Risk
N. Vlassis
Fernando Amat Gil
Ashok Chandrashekar
OffRL
19
3
0
05 Jan 2021
Is Pessimism Provably Efficient for Offline RL?
Ying Jin
Zhuoran Yang
Zhaoran Wang
OffRL
27
350
0
30 Dec 2020
Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL
Andrea Zanette
OffRL
31
71
0
14 Dec 2020
Offline Policy Selection under Uncertainty
Mengjiao Yang
Bo Dai
Ofir Nachum
George Tucker
Dale Schuurmans
OffRL
14
32
0
12 Dec 2020
Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior Policies
Jinlin Lai
Lixin Zou
Jiaxing Song
OffRL
10
1
0
29 Nov 2020
Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient
Botao Hao
Yaqi Duan
Tor Lattimore
Csaba Szepesvári
Mengdi Wang
OffRL
18
27
0
08 Nov 2020
Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity
Tanmay Gangwani
Jian Peng
Yuanshuo Zhou
29
10
0
05 Nov 2020
Offline Reinforcement Learning from Human Feedback in Real-World Sequence-to-Sequence Tasks
Julia Kreutzer
Stefan Riezler
Carolin (Haas) Lawrence
RALM
OffRL
13
15
0
04 Nov 2020
Off-Policy Interval Estimation with Lipschitz Value Iteration
Ziyang Tang
Yihao Feng
Na Zhang
Jian Peng
Qiang Liu
OffRL
17
6
0
29 Oct 2020
Contextual Latent-Movements Off-Policy Optimization for Robotic Manipulation Skills
Samuele Tosatto
Georgia Chalvatzaki
Jan Peters
26
12
0
26 Oct 2020
Previous
1
2
3
4
5
6
7
Next