ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1511.03722
  4. Cited By
Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

11 November 2015
Nan Jiang
Lihong Li
    OffRL
ArXivPDFHTML

Papers citing "Doubly Robust Off-policy Value Evaluation for Reinforcement Learning"

50 / 163 papers shown
Title
Human-centric Dialog Training via Offline Reinforcement Learning
Human-centric Dialog Training via Offline Reinforcement Learning
Natasha Jaques
J. Shen
Asma Ghandeharioun
Craig Ferguson
Àgata Lapedriza
Noah J. Jones
S. Gu
Rosalind W. Picard
OffRL
40
93
0
12 Oct 2020
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible
  Off-Policy Evaluation
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation
Yuta Saito
Shunsuke Aihara
Megumi Matsutani
Yusuke Narita
OffRL
24
73
0
17 Aug 2020
Clinician-in-the-Loop Decision Making: Reinforcement Learning with
  Near-Optimal Set-Valued Policies
Clinician-in-the-Loop Decision Making: Reinforcement Learning with Near-Optimal Set-Valued Policies
Shengpu Tang
Aditya Modi
Michael Sjoding
Jenna Wiens
OffRL
22
25
0
24 Jul 2020
Batch Policy Learning in Average Reward Markov Decision Processes
Batch Policy Learning in Average Reward Markov Decision Processes
Peng Liao
Zhengling Qi
Runzhe Wan
P. Klasnja
Susan Murphy
OffRL
36
81
0
23 Jul 2020
Hyperparameter Selection for Offline Reinforcement Learning
Hyperparameter Selection for Offline Reinforcement Learning
T. Paine
Cosmin Paduraru
Andrea Michi
Çağlar Gülçehre
Konrad Zolna
Alexander Novikov
Ziyun Wang
Nando de Freitas
GP
OffRL
49
146
0
17 Jul 2020
Near-Optimal Provable Uniform Convergence in Offline Policy Evaluation
  for Reinforcement Learning
Near-Optimal Provable Uniform Convergence in Offline Policy Evaluation for Reinforcement Learning
Ming Yin
Yu Bai
Yu Wang
OffRL
44
31
0
07 Jul 2020
Off-policy Bandits with Deficient Support
Off-policy Bandits with Deficient Support
Noveen Sachdeva
Yi-Hsun Su
Thorsten Joachims
OffRL
38
75
0
16 Jun 2020
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets
Ashvin Nair
Abhishek Gupta
Murtaza Dalal
Sergey Levine
OffRL
OnRL
46
592
0
16 Jun 2020
Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic
  Policies
Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies
Nathan Kallus
Masatoshi Uehara
OffRL
16
15
0
06 Jun 2020
Optimizing for the Future in Non-Stationary MDPs
Optimizing for the Future in Non-Stationary MDPs
Yash Chandak
Georgios Theocharous
Shiv Shankar
Martha White
Sridhar Mahadevan
Philip S. Thomas
OffRL
18
65
0
17 May 2020
Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning
Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning
Shangtong Zhang
Bo Liu
Shimon Whiteson
29
38
0
22 Apr 2020
Black-box Off-policy Estimation for Infinite-Horizon Reinforcement
  Learning
Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning
Ali Mousavi
Lihong Li
Qiang Liu
Denny Zhou
OffRL
27
32
0
24 Mar 2020
Off-policy Policy Evaluation For Sequential Decisions Under Unobserved
  Confounding
Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding
Hongseok Namkoong
Ramtin Keramati
Steve Yadlowsky
Emma Brunskill
OffRL
24
63
0
12 Mar 2020
Batch Stationary Distribution Estimation
Batch Stationary Distribution Estimation
Junfeng Wen
Bo Dai
Lihong Li
Dale Schuurmans
OffRL
22
22
0
02 Mar 2020
Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation
Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation
Yaqi Duan
Mengdi Wang
OffRL
32
149
0
21 Feb 2020
Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement
  Learning
Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning
Nathan Kallus
Angela Zhou
OffRL
38
58
0
11 Feb 2020
Estimating Counterfactual Treatment Outcomes over Time Through
  Adversarially Balanced Representations
Estimating Counterfactual Treatment Outcomes over Time Through Adversarially Balanced Representations
Ioana Bica
Ahmed Alaa
James Jordon
M. Schaar
BDL
CML
16
180
0
10 Feb 2020
Interpretable Off-Policy Evaluation in Reinforcement Learning by
  Highlighting Influential Transitions
Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions
Omer Gottesman
Joseph D. Futoma
Yao Liu
Soanli Parbhoo
Leo Anthony Celi
Emma Brunskill
Finale Doshi-Velez
OffRL
147
56
0
10 Feb 2020
Minimax Value Interval for Off-Policy Evaluation and Policy Optimization
Minimax Value Interval for Off-Policy Evaluation and Policy Optimization
Nan Jiang
Jiawei Huang
OffRL
41
17
0
06 Feb 2020
Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement
  Learning Framework
Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement Learning Framework
C. Shi
Xiaoyu Wang
Shuang Luo
Hongtu Zhu
Jieping Ye
R. Song
CML
OffRL
32
33
0
05 Feb 2020
Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement
  Learning
Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning
Ming Yin
Yu Wang
OffRL
29
80
0
29 Jan 2020
Reinforcement Learning via Fenchel-Rockafellar Duality
Reinforcement Learning via Fenchel-Rockafellar Duality
Ofir Nachum
Bo Dai
OffRL
16
118
0
07 Jan 2020
Off-Policy Estimation of Long-Term Average Outcomes with Applications to
  Mobile Health
Off-Policy Estimation of Long-Term Average Outcomes with Applications to Mobile Health
Peng Liao
P. Klasnja
Susan Murphy
OffRL
27
66
0
30 Dec 2019
Empirical Study of Off-Policy Policy Evaluation for Reinforcement
  Learning
Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning
Cameron Voloshin
Hoang Minh Le
Nan Jiang
Yisong Yue
OffRL
32
152
0
15 Nov 2019
Minimax Weight and Q-Function Learning for Off-Policy Evaluation
Minimax Weight and Q-Function Learning for Off-Policy Evaluation
Masatoshi Uehara
Jiawei Huang
Nan Jiang
OffRL
31
184
0
28 Oct 2019
BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement
  Learning
BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning
Xinyue Chen
Zijian Zhou
Junyao Xing
Che Wang
Yanqiu Wu
Keith Ross
OffRL
35
121
0
27 Oct 2019
From Importance Sampling to Doubly Robust Policy Gradient
From Importance Sampling to Doubly Robust Policy Gradient
Jiawei Huang
Nan Jiang
OffRL
35
24
0
20 Oct 2019
Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation
Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation
Ziyang Tang
Yihao Feng
Lihong Li
Dengyong Zhou
Qiang Liu
OffRL
30
67
0
16 Oct 2019
Understanding the Curse of Horizon in Off-Policy Evaluation via
  Conditional Importance Sampling
Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling
Yao Liu
Pierre-Luc Bacon
Emma Brunskill
OffRL
22
45
0
15 Oct 2019
Meta-Q-Learning
Meta-Q-Learning
Rasool Fakoor
Pratik Chaudhari
Stefano Soatto
Alex Smola
OffRL
33
145
0
30 Sep 2019
Causal Modeling for Fairness in Dynamical Systems
Causal Modeling for Fairness in Dynamical Systems
Elliot Creager
David Madras
T. Pitassi
R. Zemel
29
67
0
18 Sep 2019
Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with
  Double Reinforcement Learning
Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning
Nathan Kallus
Masatoshi Uehara
OffRL
26
88
0
12 Sep 2019
Personalized HeartSteps: A Reinforcement Learning Algorithm for
  Optimizing Physical Activity
Personalized HeartSteps: A Reinforcement Learning Algorithm for Optimizing Physical Activity
Peng Liao
Kristjan Greenewald
P. Klasnja
Susan Murphy
25
83
0
08 Sep 2019
Double Reinforcement Learning for Efficient Off-Policy Evaluation in
  Markov Decision Processes
Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes
Nathan Kallus
Masatoshi Uehara
OffRL
43
183
0
22 Aug 2019
Doubly-Robust Lasso Bandit
Doubly-Robust Lasso Bandit
Gi-Soo Kim
M. Paik
24
61
0
26 Jul 2019
Task Selection Policies for Multitask Learning
Task Selection Policies for Multitask Learning
John Glover
Chris Hokamp
OffRL
29
7
0
14 Jul 2019
A Review of Robot Learning for Manipulation: Challenges,
  Representations, and Algorithms
A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms
Oliver Kroemer
S. Niekum
George Konidaris
41
356
0
06 Jul 2019
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary
  Distribution Corrections
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections
Ofir Nachum
Yinlam Chow
Bo Dai
Lihong Li
OffRL
13
328
0
10 Jun 2019
Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for
  Reinforcement Learning
Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning
Nathan Kallus
Masatoshi Uehara
OffRL
24
54
0
09 Jun 2019
Learning When-to-Treat Policies
Learning When-to-Treat Policies
Xinkun Nie
Emma Brunskill
Stefan Wager
CML
OffRL
26
89
0
23 May 2019
Experimental Evaluation of Individualized Treatment Rules
Experimental Evaluation of Individualized Treatment Rules
Kosuke Imai
Michael Lingzhi Li
9
38
0
14 May 2019
Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs
Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs
Marek Petrik
R. Russel
29
61
0
20 Feb 2019
Imitation-Regularized Offline Learning
Imitation-Regularized Offline Learning
Yifei Ma
Yu Wang
Balakrishnan
Balakrishnan Narayanaswamy
OffRL
16
22
0
15 Jan 2019
Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search
Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search
Lars Buesing
T. Weber
Yori Zwols
S. Racanière
A. Guez
Jean-Baptiste Lespiau
N. Heess
CML
37
135
0
15 Nov 2018
Policy Certificates: Towards Accountable Reinforcement Learning
Policy Certificates: Towards Accountable Reinforcement Learning
Christoph Dann
Ashutosh Adhikari
Wei Wei
Jimmy J. Lin
OffRL
25
141
0
07 Nov 2018
Neural Approaches to Conversational AI
Neural Approaches to Conversational AI
Jianfeng Gao
Michel Galley
Lihong Li
49
670
0
21 Sep 2018
Per-decision Multi-step Temporal Difference Learning with Control
  Variates
Per-decision Multi-step Temporal Difference Learning with Control Variates
Kristopher De Asis
R. Sutton
24
7
0
05 Jul 2018
Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration
  Matters
Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters
Aniruddh Raghu
Omer Gottesman
Yao Liu
Matthieu Komorowski
A. Faisal
Finale Doshi-Velez
Emma Brunskill
OffRL
33
33
0
03 Jul 2018
Reliability and Learnability of Human Bandit Feedback for
  Sequence-to-Sequence Reinforcement Learning
Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning
Julia Kreutzer
Joshua Uyheng
Stefan Riezler
33
85
0
27 May 2018
Representation Balancing MDPs for Off-Policy Policy Evaluation
Representation Balancing MDPs for Off-Policy Policy Evaluation
Yao Liu
Omer Gottesman
Aniruddh Raghu
Matthieu Komorowski
A. Faisal
Finale Doshi-Velez
Emma Brunskill
OffRL
27
75
0
23 May 2018
Previous
1234
Next