ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.16596
  4. Cited By
Benchmarks for Deep Off-Policy Evaluation

Benchmarks for Deep Off-Policy Evaluation

30 March 2021
Justin Fu
Mohammad Norouzi
Ofir Nachum
George Tucker
Ziyun Wang
Alexander Novikov
Mengjiao Yang
Michael Ruogu Zhang
Yutian Chen
Aviral Kumar
Cosmin Paduraru
Sergey Levine
T. Paine
    ELM
    OffRL
ArXivPDFHTML

Papers citing "Benchmarks for Deep Off-Policy Evaluation"

44 / 44 papers shown
Title
Prompt Optimization with Logged Bandit Data
Prompt Optimization with Logged Bandit Data
Haruka Kiyohara
Daniel Yiming Cao
Yuta Saito
Thorsten Joachims
116
0
0
03 Apr 2025
Foundational Policy Acquisition via Multitask Learning for Motor Skill Generation
Foundational Policy Acquisition via Multitask Learning for Motor Skill Generation
Satoshi Yamamori
Jun Morimoto
42
0
0
31 Aug 2023
Autoregressive Dynamics Models for Offline Policy Evaluation and
  Optimization
Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization
Michael Ruogu Zhang
T. Paine
Ofir Nachum
Cosmin Paduraru
George Tucker
Ziyun Wang
Mohammad Norouzi
OffRL
54
46
0
28 Apr 2021
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible
  Off-Policy Evaluation
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation
Yuta Saito
Shunsuke Aihara
Megumi Matsutani
Yusuke Narita
OffRL
78
74
0
17 Aug 2020
Statistical Bootstrapping for Uncertainty Estimation in Off-Policy
  Evaluation
Statistical Bootstrapping for Uncertainty Estimation in Off-Policy Evaluation
Ilya Kostrikov
Ofir Nachum
OffRL
22
30
0
27 Jul 2020
Hyperparameter Selection for Offline Reinforcement Learning
Hyperparameter Selection for Offline Reinforcement Learning
T. Paine
Cosmin Paduraru
Andrea Michi
Çağlar Gülçehre
Konrad Zolna
Alexander Novikov
Ziyun Wang
Nando de Freitas
GP
OffRL
94
147
0
17 Jul 2020
Off-Policy Evaluation via the Regularized Lagrangian
Off-Policy Evaluation via the Regularized Lagrangian
Mengjiao Yang
Ofir Nachum
Bo Dai
Lihong Li
Dale Schuurmans
OffRL
14
115
0
07 Jul 2020
Critic Regularized Regression
Critic Regularized Regression
Ziyun Wang
Alexander Novikov
Konrad Zolna
Jost Tobias Springenberg
Scott E. Reed
...
Noah Y. Siegel
J. Merel
Çağlar Gülçehre
N. Heess
Nando de Freitas
OffRL
117
320
0
26 Jun 2020
RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning
RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning
Çağlar Gülçehre
Ziyun Wang
Alexander Novikov
T. Paine
Sergio Gomez Colmenarejo
...
Matthew W. Hoffman
Ofir Nachum
George Tucker
N. Heess
Nando de Freitas
OffRL
42
71
0
24 Jun 2020
Acme: A Research Framework for Distributed Reinforcement Learning
Acme: A Research Framework for Distributed Reinforcement Learning
Matthew W. Hoffman
Bobak Shahriari
John Aslanides
Gabriel Barth-Maron
Nikola Momchev
...
Srivatsan Srinivasan
A. Cowie
Ziyun Wang
Bilal Piot
Nando de Freitas
90
225
0
01 Jun 2020
D4RL: Datasets for Deep Data-Driven Reinforcement Learning
D4RL: Datasets for Deep Data-Driven Reinforcement Learning
Justin Fu
Aviral Kumar
Ofir Nachum
George Tucker
Sergey Levine
GP
OffRL
167
1,338
0
15 Apr 2020
Batch Stationary Distribution Estimation
Batch Stationary Distribution Estimation
Junfeng Wen
Bo Dai
Lihong Li
Dale Schuurmans
OffRL
51
22
0
02 Mar 2020
Keep Doing What Worked: Behavioral Modelling Priors for Offline
  Reinforcement Learning
Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning
Noah Y. Siegel
Jost Tobias Springenberg
Felix Berkenkamp
A. Abdolmaleki
Michael Neunert
Thomas Lampe
Roland Hafner
Nicolas Heess
Martin Riedmiller
OffRL
31
283
0
19 Feb 2020
Empirical Study of Off-Policy Policy Evaluation for Reinforcement
  Learning
Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning
Cameron Voloshin
Hoang Minh Le
Nan Jiang
Yisong Yue
OffRL
35
154
0
15 Nov 2019
When to Trust Your Model: Model-Based Policy Optimization
When to Trust Your Model: Model-Based Policy Optimization
Michael Janner
Justin Fu
Marvin Zhang
Sergey Levine
OffRL
48
939
0
19 Jun 2019
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary
  Distribution Corrections
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections
Ofir Nachum
Yinlam Chow
Bo Dai
Lihong Li
OffRL
66
332
0
10 Jun 2019
Off-Policy Evaluation via Off-Policy Classification
Off-Policy Evaluation via Off-Policy Classification
A. Irpan
Kanishka Rao
Konstantinos Bousmalis
Chris Harris
Julian Ibarz
Sergey Levine
OffRL
28
50
0
04 Jun 2019
Learning When-to-Treat Policies
Learning When-to-Treat Policies
Xinkun Nie
Emma Brunskill
Stefan Wager
CML
OffRL
40
90
0
23 May 2019
Batch Policy Learning under Constraints
Batch Policy Learning under Constraints
Hoang Minh Le
Cameron Voloshin
Yisong Yue
OffRL
40
328
0
20 Mar 2019
Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error
  Reduction via Surrogate Policy
Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy
Yuan Xie
Boyi Liu
Qiang Liu
Zhaoran Wang
Yuanshuo Zhou
Jian-wei Peng
OffRL
26
19
0
01 Aug 2018
Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration
  Matters
Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters
Aniruddh Raghu
Omer Gottesman
Yao Liu
Matthieu Komorowski
A. Faisal
Finale Doshi-Velez
Emma Brunskill
OffRL
37
34
0
03 Jul 2018
Learning to Drive in a Day
Learning to Drive in a Day
Alex Kendall
Jeffrey Hawke
David Janz
Przemyslaw Mazur
Daniele Reda
John M. Allen
Vinh-Dieu Lam
Alex Bewley
Amar Shah
66
649
0
01 Jul 2018
QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic
  Manipulation
QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation
Dmitry Kalashnikov
A. Irpan
P. Pastor
Julian Ibarz
Alexander Herzog
...
Deirdre Quillen
E. Holly
Mrinal Kalakrishnan
Vincent Vanhoucke
Sergey Levine
90
1,454
0
27 Jun 2018
Importance Sampling Policy Evaluation with an Estimated Behavior Policy
Importance Sampling Policy Evaluation with an Estimated Behavior Policy
Josiah P. Hanna
S. Niekum
Peter Stone
OffRL
23
67
0
04 Jun 2018
Deep Reinforcement Learning in a Handful of Trials using Probabilistic
  Dynamics Models
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
Kurtland Chua
Roberto Calandra
R. McAllister
Sergey Levine
BDL
125
1,263
0
30 May 2018
Addressing Function Approximation Error in Actor-Critic Methods
Addressing Function Approximation Error in Actor-Critic Methods
Scott Fujimoto
H. V. Hoof
David Meger
OffRL
132
5,121
0
26 Feb 2018
Offline A/B testing for Recommender Systems
Offline A/B testing for Recommender Systems
Alexandre Gilotte
Clément Calauzènes
Thomas Nedelec
A. Abraham
Simon Dollé
OffRL
53
220
0
22 Jan 2018
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement
  Learning with a Stochastic Actor
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Tuomas Haarnoja
Aurick Zhou
Pieter Abbeel
Sergey Levine
172
8,236
0
04 Jan 2018
Learning Complex Dexterous Manipulation with Deep Reinforcement Learning
  and Demonstrations
Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations
Aravind Rajeswaran
Vikash Kumar
Abhishek Gupta
Giulia Vezzani
John Schulman
E. Todorov
Sergey Levine
85
1,079
0
28 Sep 2017
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
185
18,685
0
20 Jul 2017
Optimal and Adaptive Off-policy Evaluation in Contextual Bandits
Optimal and Adaptive Off-policy Evaluation in Contextual Bandits
Yu Wang
Alekh Agarwal
Miroslav Dudík
OffRL
47
220
0
04 Dec 2016
Deep Reinforcement Learning for Robotic Manipulation with Asynchronous
  Off-Policy Updates
Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates
S. Gu
E. Holly
Timothy Lillicrap
Sergey Levine
OffRL
SSL
82
1,474
0
03 Oct 2016
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learning
Rémi Munos
T. Stepleton
Anna Harutyunyan
Marc G. Bellemare
OffRL
105
611
0
08 Jun 2016
Off-policy evaluation for slate recommendation
Off-policy evaluation for slate recommendation
Adith Swaminathan
A. Krishnamurthy
Alekh Agarwal
Miroslav Dudík
John Langford
Damien Jose
I. Zitouni
CML
OffRL
28
225
0
16 May 2016
Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
Philip S. Thomas
Emma Brunskill
OffRL
150
573
0
04 Apr 2016
Doubly Robust Off-policy Value Evaluation for Reinforcement Learning
Doubly Robust Off-policy Value Evaluation for Reinforcement Learning
Nan Jiang
Lihong Li
OffRL
94
621
0
11 Nov 2015
Continuous control with deep reinforcement learning
Continuous control with deep reinforcement learning
Timothy Lillicrap
Jonathan J. Hunt
Alexander Pritzel
N. Heess
Tom Erez
Yuval Tassa
David Silver
Daan Wierstra
143
13,174
0
09 Sep 2015
An Emphatic Approach to the Problem of Off-policy Temporal-Difference
  Learning
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning
R. Sutton
A. R. Mahmood
Martha White
45
267
0
14 Mar 2015
Doubly Robust Policy Evaluation and Optimization
Doubly Robust Policy Evaluation and Optimization
Miroslav Dudík
D. Erhan
John Langford
Lihong Li
OffRL
88
283
0
10 Mar 2015
Trust Region Policy Optimization
Trust Region Policy Optimization
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
221
6,722
0
19 Feb 2015
Playing Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement Learning
Volodymyr Mnih
Koray Kavukcuoglu
David Silver
Alex Graves
Ioannis Antonoglou
Daan Wierstra
Martin Riedmiller
63
12,163
0
19 Dec 2013
Counterfactual Reasoning and Learning Systems
Counterfactual Reasoning and Learning Systems
Léon Bottou
J. Peters
J. Q. Candela
Denis Xavier Charles
D. M. Chickering
Elon Portugaly
Dipankar Ray
Patrice Y. Simard
Edward Snelson
CML
OffRL
123
781
0
11 Sep 2012
The Arcade Learning Environment: An Evaluation Platform for General
  Agents
The Arcade Learning Environment: An Evaluation Platform for General Agents
Marc G. Bellemare
Yavar Naddaf
J. Veness
Michael Bowling
54
2,992
0
19 Jul 2012
A Contextual-Bandit Approach to Personalized News Article Recommendation
A Contextual-Bandit Approach to Personalized News Article Recommendation
Lihong Li
Wei Chu
John Langford
Robert Schapire
219
2,935
0
28 Feb 2010
1