ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1604.00923
  4. Cited By
Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning

Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning

4 April 2016
Philip S. Thomas
Emma Brunskill
    OffRL
ArXivPDFHTML

Papers citing "Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning"

50 / 342 papers shown
Title
Dr Jekyll and Mr Hyde: the Strange Case of Off-Policy Policy Updates
Dr Jekyll and Mr Hyde: the Strange Case of Off-Policy Policy Updates
Romain Laroche
Rémi Tachet des Combes
46
8
0
29 Sep 2021
A Spectral Approach to Off-Policy Evaluation for POMDPs
A Spectral Approach to Off-Policy Evaluation for POMDPs
Yash Nair
Nan Jiang
OffRL
26
17
0
22 Sep 2021
Accelerating Offline Reinforcement Learning Application in Real-Time
  Bidding and Recommendation: Potential Use of Simulation
Accelerating Offline Reinforcement Learning Application in Real-Time Bidding and Recommendation: Potential Use of Simulation
Haruka Kiyohara
K. Kawakami
Yuta Saito
OffRL
32
12
0
17 Sep 2021
State Relevance for Off-Policy Evaluation
State Relevance for Off-Policy Evaluation
S. Shen
Yecheng Ma
Omer Gottesman
Finale Doshi-Velez
OffRL
16
4
0
13 Sep 2021
Provable Benefits of Actor-Critic Methods for Offline Reinforcement
  Learning
Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
Andrea Zanette
Martin J. Wainwright
Emma Brunskill
OffRL
34
115
0
19 Aug 2021
Model Selection for Offline Reinforcement Learning: Practical
  Considerations for Healthcare Settings
Model Selection for Offline Reinforcement Learning: Practical Considerations for Healthcare Settings
Shengpu Tang
Jenna Wiens
OffRL
26
78
0
23 Jul 2021
Conservative Offline Distributional Reinforcement Learning
Conservative Offline Distributional Reinforcement Learning
Yecheng Jason Ma
Dinesh Jayaraman
Osbert Bastani
OffRL
73
79
0
12 Jul 2021
Supervised Off-Policy Ranking
Supervised Off-Policy Ranking
Yue Jin
Yue Zhang
Tao Qin
Xudong Zhang
Jian Yuan
Houqiang Li
Tie-Yan Liu
OffRL
37
5
0
03 Jul 2021
On component interactions in two-stage recommender systems
On component interactions in two-stage recommender systems
Jiri Hron
K. Krauth
Michael I. Jordan
Niki Kilbertus
CML
LRM
42
31
0
28 Jun 2021
Unifying Gradient Estimators for Meta-Reinforcement Learning via
  Off-Policy Evaluation
Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation
Yunhao Tang
Tadashi Kozuno
Mark Rowland
Rémi Munos
Michal Valko
OffRL
27
9
0
24 Jun 2021
Variance-Aware Off-Policy Evaluation with Linear Function Approximation
Variance-Aware Off-Policy Evaluation with Linear Function Approximation
Yifei Min
Tianhao Wang
Dongruo Zhou
Quanquan Gu
OffRL
42
38
0
22 Jun 2021
Control Variates for Slate Off-Policy Evaluation
Control Variates for Slate Off-Policy Evaluation
N. Vlassis
Ashok Chandrashekar
Fernando Amat Gil
Nathan Kallus
OffRL
28
9
0
15 Jun 2021
A Deep Reinforcement Learning Approach to Marginalized Importance
  Sampling with the Successor Representation
A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation
Scott Fujimoto
David Meger
Doina Precup
10
16
0
12 Jun 2021
Recomposing the Reinforcement Learning Building Blocks with
  Hypernetworks
Recomposing the Reinforcement Learning Building Blocks with Hypernetworks
Shai Keynan
Elad Sarafian
Sarit Kraus
OffRL
23
29
0
12 Jun 2021
Robust Generalization despite Distribution Shift via Minimum
  Discriminating Information
Robust Generalization despite Distribution Shift via Minimum Discriminating Information
Tobias Sutter
Andreas Krause
Daniel Kuhn
OOD
27
10
0
08 Jun 2021
Offline Policy Comparison under Limited Historical Agent-Environment
  Interactions
Offline Policy Comparison under Limited Historical Agent-Environment Interactions
Anton Dereventsov
Joseph Daws
Clayton Webster
OffRL
34
3
0
07 Jun 2021
Post-Contextual-Bandit Inference
Post-Contextual-Bandit Inference
Aurélien F. Bibaut
Antoine Chambaz
Maria Dimakopoulou
Nathan Kallus
Mark van der Laan
32
39
0
01 Jun 2021
Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety
  Constraints in Finite MDPs
Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs
Harsh Satija
Philip S. Thomas
Joelle Pineau
Romain Laroche
OffRL
43
21
0
31 May 2021
A unified view of likelihood ratio and reparameterization gradients
A unified view of likelihood ratio and reparameterization gradients
Paavo Parmas
Masashi Sugiyama
28
9
0
31 May 2021
On Instrumental Variable Regression for Deep Offline Policy Evaluation
On Instrumental Variable Regression for Deep Offline Policy Evaluation
Yutian Chen
Liyuan Xu
Çağlar Gülçehre
T. Paine
Arthur Gretton
Nando de Freitas
Arnaud Doucet
OffRL
56
18
0
21 May 2021
Deeply-Debiased Off-Policy Interval Estimation
Deeply-Debiased Off-Policy Interval Estimation
C. Shi
Runzhe Wan
Victor Chernozhukov
R. Song
OffRL
30
36
0
10 May 2021
Towards Theoretical Understandings of Robust Markov Decision Processes:
  Sample Complexity and Asymptotics
Towards Theoretical Understandings of Robust Markov Decision Processes: Sample Complexity and Asymptotics
Wenhao Yang
Liangyu Zhang
Zhihua Zhang
28
33
0
09 May 2021
Statistical Inference with M-Estimators on Adaptively Collected Data
Statistical Inference with M-Estimators on Adaptively Collected Data
Kelly W. Zhang
Lucas Janson
Susan Murphy
OffRL
19
41
0
29 Apr 2021
Autoregressive Dynamics Models for Offline Policy Evaluation and
  Optimization
Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization
Michael Ruogu Zhang
T. Paine
Ofir Nachum
Cosmin Paduraru
George Tucker
Ziyun Wang
Mohammad Norouzi
OffRL
30
45
0
28 Apr 2021
Universal Off-Policy Evaluation
Universal Off-Policy Evaluation
Yash Chandak
S. Niekum
Bruno C. da Silva
Erik Learned-Miller
Emma Brunskill
Philip S. Thomas
OffRL
ELM
39
52
0
26 Apr 2021
Discovering an Aid Policy to Minimize Student Evasion Using Offline
  Reinforcement Learning
Discovering an Aid Policy to Minimize Student Evasion Using Offline Reinforcement Learning
Leandro M. de Lima
R. Krohling
OffRL
45
4
0
20 Apr 2021
Off-Policy Risk Assessment in Contextual Bandits
Off-Policy Risk Assessment in Contextual Bandits
Audrey Huang
Liu Leqi
Zachary Chase Lipton
Kamyar Azizzadenesheli
OffRL
32
36
0
18 Apr 2021
Benchmarks for Deep Off-Policy Evaluation
Benchmarks for Deep Off-Policy Evaluation
Justin Fu
Mohammad Norouzi
Ofir Nachum
George Tucker
Ziyun Wang
...
Yutian Chen
Aviral Kumar
Cosmin Paduraru
Sergey Levine
T. Paine
ELM
OffRL
35
100
0
30 Mar 2021
Learning Under Adversarial and Interventional Shifts
Learning Under Adversarial and Interventional Shifts
Harvineet Singh
Shalmali Joshi
Finale Doshi-Velez
Himabindu Lakkaraju
OOD
17
3
0
29 Mar 2021
Estimating the Long-Term Effects of Novel Treatments
Estimating the Long-Term Effects of Novel Treatments
Keith Battocchi
E. Dillon
Maggie Hei
Greg Lewis
M. Oprescu
Vasilis Syrgkanis
CML
22
10
0
15 Mar 2021
Learning robust driving policies without online exploration
Learning robust driving policies without online exploration
D. Graves
Nhat M. Nguyen
Kimia Hassanzadeh
Jun Jin
Jun Luo
OffRL
14
2
0
15 Mar 2021
Sample Complexity of Offline Reinforcement Learning with Deep ReLU
  Networks
Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks
Thanh Nguyen-Tang
Sunil R. Gupta
Hung The Tran
Svetha Venkatesh
OffRL
70
7
0
11 Mar 2021
Causal-aware Safe Policy Improvement for Task-oriented dialogue
Causal-aware Safe Policy Improvement for Task-oriented dialogue
Govardana Sachithanandam Ramachandran
Kazuma Hashimoto
Caiming Xiong
OffRL
11
11
0
10 Mar 2021
Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and
  Dual Bounds
Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and Dual Bounds
Yihao Feng
Ziyang Tang
Na Zhang
Qiang Liu
OffRL
17
14
0
09 Mar 2021
Instabilities of Offline RL with Pre-Trained Neural Representation
Instabilities of Offline RL with Pre-Trained Neural Representation
Ruosong Wang
Yifan Wu
Ruslan Salakhutdinov
Sham Kakade
OffRL
24
42
0
08 Mar 2021
Personalization for Web-based Services using Offline Reinforcement
  Learning
Personalization for Web-based Services using Offline Reinforcement Learning
P. Apostolopoulos
Zehui Wang
Hanson Wang
Chad Zhou
Kittipat Virochsiri
Norm Zhou
Igor L. Markov
OffRL
OnRL
27
7
0
10 Feb 2021
Bootstrapping Fitted Q-Evaluation for Off-Policy Inference
Bootstrapping Fitted Q-Evaluation for Off-Policy Inference
Botao Hao
X. Ji
Yaqi Duan
Hao Lu
Csaba Szepesvári
Mengdi Wang
OffRL
11
37
0
06 Feb 2021
Fast Rates for the Regret of Offline Reinforcement Learning
Fast Rates for the Regret of Offline Reinforcement Learning
Yichun Hu
Nathan Kallus
Masatoshi Uehara
OffRL
26
30
0
31 Jan 2021
High-Confidence Off-Policy (or Counterfactual) Variance Estimation
High-Confidence Off-Policy (or Counterfactual) Variance Estimation
Yash Chandak
Shiv Shankar
Philip S. Thomas
OffRL
19
8
0
25 Jan 2021
Minimax Off-Policy Evaluation for Multi-Armed Bandits
Minimax Off-Policy Evaluation for Multi-Armed Bandits
Cong Ma
Banghua Zhu
Jiantao Jiao
Martin J. Wainwright
OffRL
16
10
0
19 Jan 2021
Off-Policy Evaluation of Slate Policies under Bayes Risk
Off-Policy Evaluation of Slate Policies under Bayes Risk
N. Vlassis
Fernando Amat Gil
Ashok Chandrashekar
OffRL
19
3
0
05 Jan 2021
Is Pessimism Provably Efficient for Offline RL?
Is Pessimism Provably Efficient for Offline RL?
Ying Jin
Zhuoran Yang
Zhaoran Wang
OffRL
27
350
0
30 Dec 2020
Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can
  be Exponentially Harder than Online RL
Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL
Andrea Zanette
OffRL
31
71
0
14 Dec 2020
Offline Policy Selection under Uncertainty
Offline Policy Selection under Uncertainty
Mengjiao Yang
Bo Dai
Ofir Nachum
George Tucker
Dale Schuurmans
OffRL
14
32
0
12 Dec 2020
Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior
  Policies
Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior Policies
Jinlin Lai
Lixin Zou
Jiaxing Song
OffRL
10
1
0
29 Nov 2020
Sparse Feature Selection Makes Batch Reinforcement Learning More Sample
  Efficient
Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient
Botao Hao
Yaqi Duan
Tor Lattimore
Csaba Szepesvári
Mengdi Wang
OffRL
18
27
0
08 Nov 2020
Harnessing Distribution Ratio Estimators for Learning Agents with
  Quality and Diversity
Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity
Tanmay Gangwani
Jian Peng
Yuanshuo Zhou
29
10
0
05 Nov 2020
Offline Reinforcement Learning from Human Feedback in Real-World
  Sequence-to-Sequence Tasks
Offline Reinforcement Learning from Human Feedback in Real-World Sequence-to-Sequence Tasks
Julia Kreutzer
Stefan Riezler
Carolin (Haas) Lawrence
RALM
OffRL
13
15
0
04 Nov 2020
Off-Policy Interval Estimation with Lipschitz Value Iteration
Off-Policy Interval Estimation with Lipschitz Value Iteration
Ziyang Tang
Yihao Feng
Na Zhang
Jian Peng
Qiang Liu
OffRL
17
6
0
29 Oct 2020
Contextual Latent-Movements Off-Policy Optimization for Robotic
  Manipulation Skills
Contextual Latent-Movements Off-Policy Optimization for Robotic Manipulation Skills
Samuele Tosatto
Georgia Chalvatzaki
Jan Peters
26
12
0
26 Oct 2020
Previous
1234567
Next