ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.02647
  4. Cited By
Safe and Efficient Off-Policy Reinforcement Learning
v1v2 (latest)

Safe and Efficient Off-Policy Reinforcement Learning

8 June 2016
Rémi Munos
T. Stepleton
Anna Harutyunyan
Marc G. Bellemare
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Safe and Efficient Off-Policy Reinforcement Learning"

50 / 374 papers shown
Title
Importance Sampling Placement in Off-Policy Temporal-Difference Methods
Importance Sampling Placement in Off-Policy Temporal-Difference Methods
Eric Graves
Sina Ghiassian
OffRL
77
2
0
18 Mar 2022
On Credit Assignment in Hierarchical Reinforcement Learning
On Credit Assignment in Hierarchical Reinforcement Learning
Joery A. de Vries
Thomas M. Moerland
Aske Plaat
23
0
0
07 Mar 2022
Follow your Nose: Using General Value Functions for Directed Exploration
  in Reinforcement Learning
Follow your Nose: Using General Value Functions for Directed Exploration in Reinforcement Learning
Durgesh Kalwar
Omkar Shelke
Somjit Nath
Hardik Meisheri
H. Khadilkar
56
1
0
02 Mar 2022
Learning Robust Real-Time Cultural Transmission without Human Data
Learning Robust Real-Time Cultural Transmission without Human Data
Cultural General Intelligence Team
Avishkar Bhoopchand
Bethanie Brownfield
Adrian Collister
Agustin Dal Lago
...
Alex Platonov
Evan Senter
Sukhdeep Singh
Alexander Zacherl
Lei M. Zhang
VLM
123
11
0
01 Mar 2022
Sequential Bayesian experimental designs via reinforcement learning
Sequential Bayesian experimental designs via reinforcement learning
Hikaru Asano
OffRL
80
0
0
14 Feb 2022
Constrained Variational Policy Optimization for Safe Reinforcement
  Learning
Constrained Variational Policy Optimization for Safe Reinforcement Learning
Zuxin Liu
Zhepeng Cen
Vladislav Isenbaev
Wei Liu
Zhiwei Steven Wu
Yue Liu
Ding Zhao
98
81
0
28 Jan 2022
The Challenges of Exploration for Offline Reinforcement Learning
The Challenges of Exploration for Offline Reinforcement Learning
Nathan Lambert
Markus Wulfmeier
William F. Whitney
Arunkumar Byravan
Michael Bloesch
Vibhavari Dasagi
Tim Hertweck
Martin Riedmiller
OffRL
91
29
0
27 Jan 2022
Boosting Exploration in Multi-Task Reinforcement Learning using
  Adversarial Networks
Boosting Exploration in Multi-Task Reinforcement Learning using Adversarial Networks
Ramnath Kumar
T. Deleu
Yoshua Bengio
43
0
0
27 Jan 2022
Priors, Hierarchy, and Information Asymmetry for Skill Transfer in
  Reinforcement Learning
Priors, Hierarchy, and Information Asymmetry for Skill Transfer in Reinforcement Learning
Sasha Salter
Kristian Hartikainen
Walter Goodwin
Ingmar Posner
OffRL
63
5
0
20 Jan 2022
Chaining Value Functions for Off-Policy Learning
Chaining Value Functions for Off-Policy Learning
Simon Schmitt
John Shawe-Taylor
Hado van Hasselt
OffRL
54
3
0
17 Jan 2022
Improving the Efficiency of Off-Policy Reinforcement Learning by
  Accounting for Past Decisions
Improving the Efficiency of Off-Policy Reinforcement Learning by Accounting for Past Decisions
Brett Daley
Chris Amato
OffRL
99
1
0
23 Dec 2021
Model-Value Inconsistency as a Signal for Epistemic Uncertainty
Model-Value Inconsistency as a Signal for Epistemic Uncertainty
Angelos Filos
Eszter Vértes
Zita Marinho
Gregory Farquhar
Diana Borsa
A. Friesen
Feryal M. P. Behbahani
Tom Schaul
André Barreto
Simon Osindero
104
7
0
08 Dec 2021
A Review for Deep Reinforcement Learning in Atari:Benchmarks,
  Challenges, and Solutions
A Review for Deep Reinforcement Learning in Atari:Benchmarks, Challenges, and Solutions
Jiajun Fan
OffRL
85
21
0
08 Dec 2021
Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence
  Model Tackles All SMAC Tasks
Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks
Linghui Meng
Muning Wen
Yaodong Yang
Chenyang Le
Xiyun Li
Weinan Zhang
Ying Wen
Haifeng Zhang
Jun Wang
Bo Xu
OffRL
98
43
0
06 Dec 2021
Reinforcement Explanation Learning
Reinforcement Explanation Learning
Siddhant Agarwal
Owais Iqbal
Sree Aditya Buridi
Madda Manjusha
Abir Das
FAtt
31
0
0
26 Nov 2021
Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning
Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning
Nicolai Dorka
Tim Welschehold
Joschka Boedecker
Wolfram Burgard
OffRL
56
9
0
24 Nov 2021
Off-Policy Correction For Multi-Agent Reinforcement Learning
Off-Policy Correction For Multi-Agent Reinforcement Learning
Michał Zawalski
Bla.zej Osiñski
Henryk Michalewski
Piotr Milo's
OffRL
76
2
0
22 Nov 2021
SOPE: Spectrum of Off-Policy Estimators
SOPE: Spectrum of Off-Policy Estimators
C. J. Yuan
Yash Chandak
S. Giguere
Philip S. Thomas
S. Niekum
OffRL
96
5
0
06 Nov 2021
Supervised Advantage Actor-Critic for Recommender Systems
Supervised Advantage Actor-Critic for Recommender Systems
Xin Xin
Alexandros Karatzoglou
Ioannis Arapakis
J. Jose
OffRL
66
30
0
05 Nov 2021
Is Bang-Bang Control All You Need? Solving Continuous Control with
  Bernoulli Policies
Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies
Tim Seyde
Igor Gilitschenski
Wilko Schwarting
Bartolomeo Stellato
Martin Riedmiller
Markus Wulfmeier
Daniela Rus
103
44
0
03 Nov 2021
Self-Consistent Models and Values
Self-Consistent Models and Values
Roy Miles
Kate Baumli
Zita Marinho
Angelos Filos
Matteo Hessel
Hado van Hasselt
David Silver
91
8
0
25 Oct 2021
Variance Reduction based Experience Replay for Policy Optimization
Variance Reduction based Experience Replay for Policy Optimization
Hua Zheng
Wei Xie
M. Feng
OffRL
90
2
0
17 Oct 2021
Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes
Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes
Alex X. Lee
Coline Devin
Yuxiang Zhou
Thomas Lampe
Konstantinos Bousmalis
...
Stefano Saliceti
Federico Casarini
Martin Riedmiller
R. Hadsell
F. Nori
OffRL
105
101
0
12 Oct 2021
Evaluating model-based planning and planner amortization for continuous
  control
Evaluating model-based planning and planner amortization for continuous control
Arunkumar Byravan
Leonard Hasenclever
Piotr Trochim
M. Berk Mirza
Alessandro Davide Ialongo
...
Jost Tobias Springenberg
A. Abdolmaleki
N. Heess
J. Merel
Martin Riedmiller
105
17
0
07 Oct 2021
Parameter-free Reduction of the Estimation Bias in Deep Reinforcement
  Learning for Deterministic Policy Gradients
Parameter-free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients
Baturay Saglam
Furkan B. Mutlu
Dogan C. Cicek
Suleyman S. Kozat
OffRL
53
3
0
24 Sep 2021
Estimation Error Correction in Deep Reinforcement Learning for
  Deterministic Actor-Critic Methods
Estimation Error Correction in Deep Reinforcement Learning for Deterministic Actor-Critic Methods
Baturay Saglam
Enes Duran
Dogan C. Cicek
Furkan B. Mutlu
Suleyman S. Kozat
OffRL
73
12
0
22 Sep 2021
Direct Advantage Estimation
Direct Advantage Estimation
Hsiao-Ru Pan
Nico Gürtler
Alexander Neitz
Bernhard Schölkopf
OffRLCML
58
13
0
13 Sep 2021
Language Model Augmented Relevance Score
Language Model Augmented Relevance Score
Ruibo Liu
Jason W. Wei
Soroush Vosoughi
51
10
0
19 Aug 2021
Mastering Visual Continuous Control: Improved Data-Augmented
  Reinforcement Learning
Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning
Denis Yarats
Rob Fergus
A. Lazaric
Lerrel Pinto
OffRL
133
351
0
20 Jul 2021
Decoupled Reinforcement Learning to Stabilise Intrinsically-Motivated
  Exploration
Decoupled Reinforcement Learning to Stabilise Intrinsically-Motivated Exploration
Lukas Schafer
Filippos Christianos
Josiah P. Hanna
Stefano V. Albrecht
92
23
0
19 Jul 2021
Imitation by Predicting Observations
Imitation by Predicting Observations
Andrew Jaegle
Yury Sulsky
Arun Ahuja
Jake Bruce
Rob Fergus
Greg Wayne
41
12
0
08 Jul 2021
Improve Agents without Retraining: Parallel Tree Search with Off-Policy
  Correction
Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction
Assaf Hallak
Gal Dalal
Steven Dalton
I. Frosio
Shie Mannor
Gal Chechik
OffRLOnRL
100
10
0
04 Jul 2021
Supervised Off-Policy Ranking
Supervised Off-Policy Ranking
Yue Jin
Yue Zhang
Tao Qin
Xudong Zhang
Jian Yuan
Houqiang Li
Tie-Yan Liu
OffRL
63
6
0
03 Jul 2021
Convergent and Efficient Deep Q Network Algorithm
Convergent and Efficient Deep Q Network Algorithm
Zhikang T. Wang
Masahito Ueda
80
12
0
29 Jun 2021
On component interactions in two-stage recommender systems
On component interactions in two-stage recommender systems
Jiri Hron
K. Krauth
Michael I. Jordan
Niki Kilbertus
CMLLRM
74
31
0
28 Jun 2021
Unifying Gradient Estimators for Meta-Reinforcement Learning via
  Off-Policy Evaluation
Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation
Yunhao Tang
Tadashi Kozuno
Mark Rowland
Rémi Munos
Michal Valko
OffRL
136
9
0
24 Jun 2021
Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman
  Operators
Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators
Zaiwei Chen
S. T. Maguluri
Sanjay Shakkottai
Karthikeyan Shanmugam
OffRL
83
13
0
24 Jun 2021
Scientific multi-agent reinforcement learning for wall-models of
  turbulent flows
Scientific multi-agent reinforcement learning for wall-models of turbulent flows
H. J. Bae
Petros Koumoutsakos
AI4CE
71
136
0
21 Jun 2021
Scalable Safety-Critical Policy Evaluation with Accelerated Rare Event
  Sampling
Scalable Safety-Critical Policy Evaluation with Accelerated Rare Event Sampling
Mengdi Xu
Peide Huang
Fengpei Li
Jiacheng Zhu
Xuewei Qi
K. Oguchi
Zhiyuan Huang
Henry Lam
Ding Zhao
40
4
0
19 Jun 2021
Offline RL Without Off-Policy Evaluation
Offline RL Without Off-Policy Evaluation
David Brandfonbrener
William F. Whitney
Rajesh Ranganath
Joan Bruna
OffRL
110
170
0
16 Jun 2021
A Deep Reinforcement Learning Approach to Marginalized Importance
  Sampling with the Successor Representation
A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation
Scott Fujimoto
David Meger
Doina Precup
76
17
0
12 Jun 2021
GDI: Rethinking What Makes Reinforcement Learning Different From
  Supervised Learning
GDI: Rethinking What Makes Reinforcement Learning Different From Supervised Learning
Jiajun Fan
Changnan Xiao
Yue Huang
OffRL
91
10
0
11 Jun 2021
An Empirical Comparison of Off-policy Prediction Learning Algorithms on
  the Collision Task
An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task
Sina Ghiassian
R. Sutton
AAMLOffRL
78
5
0
02 Jun 2021
An Entropy Regularization Free Mechanism for Policy-based Reinforcement
  Learning
An Entropy Regularization Free Mechanism for Policy-based Reinforcement Learning
Changnan Xiao
Haosen Shi
Jiajun Fan
Shihong Deng
71
5
0
01 Jun 2021
A unified view of likelihood ratio and reparameterization gradients
A unified view of likelihood ratio and reparameterization gradients
Paavo Parmas
Masashi Sugiyama
53
9
0
31 May 2021
Optimistic Reinforcement Learning by Forward Kullback-Leibler Divergence
  Optimization
Optimistic Reinforcement Learning by Forward Kullback-Leibler Divergence Optimization
Taisuke Kobayashi
68
14
0
27 May 2021
Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear
  Function Approximation
Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function Approximation
Zaiwei Chen
S. Khodadadian
S. T. Maguluri
OffRL
103
31
0
26 May 2021
From Motor Control to Team Play in Simulated Humanoid Football
From Motor Control to Team Play in Simulated Humanoid Football
Siqi Liu
Guy Lever
Zhe Wang
J. Merel
S. M. Ali Eslami
...
Tuomas Haarnoja
Brendan D. Tracey
K. Tuyls
T. Graepel
N. Heess
117
134
0
25 May 2021
Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning
Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning
Yue Wu
Shuangfei Zhai
Nitish Srivastava
J. Susskind
Jian Zhang
Ruslan Salakhutdinov
Hanlin Goh
EDLOffRLOnRL
80
190
0
17 May 2021
CASA: Bridging the Gap between Policy Improvement and Policy Evaluation
  with Conflict Averse Policy Iteration
CASA: Bridging the Gap between Policy Improvement and Policy Evaluation with Conflict Averse Policy Iteration
Changnan Xiao
Haosen Shi
Jiajun Fan
Shihong Deng
Haiyan Yin
67
0
0
09 May 2021
Previous
12345678
Next