ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.02647
  4. Cited By
Safe and Efficient Off-Policy Reinforcement Learning
v1v2 (latest)

Safe and Efficient Off-Policy Reinforcement Learning

8 June 2016
Rémi Munos
T. Stepleton
Anna Harutyunyan
Marc G. Bellemare
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Safe and Efficient Off-Policy Reinforcement Learning"

50 / 374 papers shown
Title
Mitigating Political Bias in Language Models Through Reinforced
  Calibration
Mitigating Political Bias in Language Models Through Reinforced Calibration
Ruibo Liu
Chenyan Jia
Jason W. Wei
Guangxuan Xu
Lili Wang
Soroush Vosoughi
73
99
0
30 Apr 2021
Semi-On-Policy Training for Sample Efficient Multi-Agent Policy
  Gradients
Semi-On-Policy Training for Sample Efficient Multi-Agent Policy Gradients
Bozhidar Vasilev
Tarun Gupta
Bei Peng
Shimon Whiteson
38
2
0
27 Apr 2021
Multitasking Inhibits Semantic Drift
Multitasking Inhibits Semantic Drift
Athul Paul Jacob
M. Lewis
Jacob Andreas
86
13
0
15 Apr 2021
TAAC: Temporally Abstract Actor-Critic for Continuous Control
TAAC: Temporally Abstract Actor-Critic for Continuous Control
Haonan Yu
Wei Xu
Haichao Zhang
OffRL
56
21
0
13 Apr 2021
Muesli: Combining Improvements in Policy Optimization
Muesli: Combining Improvements in Policy Optimization
Matteo Hessel
Ivo Danihelka
Fabio Viola
A. Guez
Simon Schmitt
Laurent Sifre
T. Weber
David Silver
H. V. Hasselt
105
66
0
13 Apr 2021
Co-Adaptation of Algorithmic and Implementational Innovations in
  Inference-based Deep Reinforcement Learning
Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning
Hiroki Furuta
Tadashi Kozuno
T. Matsushima
Y. Matsuo
S. Gu
120
14
0
31 Mar 2021
Benchmarks for Deep Off-Policy Evaluation
Benchmarks for Deep Off-Policy Evaluation
Justin Fu
Mohammad Norouzi
Ofir Nachum
George Tucker
Ziyun Wang
...
Yutian Chen
Aviral Kumar
Cosmin Paduraru
Sergey Levine
T. Paine
ELMOffRL
90
104
0
30 Mar 2021
Discovering Diverse Solutions in Deep Reinforcement Learning by
  Maximizing State-Action-Based Mutual Information
Discovering Diverse Solutions in Deep Reinforcement Learning by Maximizing State-Action-Based Mutual Information
Takayuki Osa
Voot Tangkaratt
Masashi Sugiyama
73
33
0
12 Mar 2021
Model-free Policy Learning with Reward Gradients
Model-free Policy Learning with Reward Gradients
Qingfeng Lan
Samuele Tosatto
Homayoon Farrahi
Rupam Mahmood
49
6
0
09 Mar 2021
Revisiting Peng's Q($λ$) for Modern Reinforcement Learning
Revisiting Peng's Q(λλλ) for Modern Reinforcement Learning
Tadashi Kozuno
Yunhao Tang
Mark Rowland
Rémi Munos
Steven Kapturowski
Will Dabney
Michal Valko
David Abel
OffRL
63
19
0
27 Feb 2021
Bias-reduced Multi-step Hindsight Experience Replay for Efficient
  Multi-goal Reinforcement Learning
Bias-reduced Multi-step Hindsight Experience Replay for Efficient Multi-goal Reinforcement Learning
Rui Yang
Jiafei Lyu
Yu Yang
Jiangpeng Yan
Feng Luo
Dijun Luo
Lanqing Li
Xiu Li
88
6
0
25 Feb 2021
Sequential Learning-based IaaS Composition
Sequential Learning-based IaaS Composition
Sajib Mistry
Sheik Mohammad Mostakim Fattah
A. Bouguettaya
49
0
0
24 Feb 2021
Beyond Fine-Tuning: Transferring Behavior in Reinforcement Learning
Beyond Fine-Tuning: Transferring Behavior in Reinforcement Learning
Victor Campos
Pablo Sprechmann
Steven Hansen
André Barreto
Steven Kapturowski
Alex Vitvitskyi
Adria Puigdomenech Badia
Charles Blundell
OffRLOnRL
83
26
0
24 Feb 2021
Greedy-Step Off-Policy Reinforcement Learning
Greedy-Step Off-Policy Reinforcement Learning
Yuhui Wang
Qingyuan Wu
Pengcheng He
Xiaoyang Tan
OffRL
50
1
0
23 Feb 2021
Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm
Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm
S. Khodadadian
Zaiwei Chen
S. T. Maguluri
CMLOffRL
137
27
0
18 Feb 2021
Learning Memory-Dependent Continuous Control from Demonstrations
Learning Memory-Dependent Continuous Control from Demonstrations
Siqing Hou
Dongqi Han
Jun Tani
23
0
0
18 Feb 2021
Q-Value Weighted Regression: Reinforcement Learning with Limited Data
Q-Value Weighted Regression: Reinforcement Learning with Limited Data
Piotr Kozakowski
Lukasz Kaiser
Henryk Michalewski
Afroz Mohiuddin
Katarzyna Kañska
OffRL
73
5
0
12 Feb 2021
Adversarially Guided Actor-Critic
Adversarially Guided Actor-Critic
Yannis Flet-Berliac
Johan Ferret
Olivier Pietquin
Philippe Preux
Matthieu Geist
73
73
0
08 Feb 2021
A review of motion planning algorithms for intelligent robotics
A review of motion planning algorithms for intelligent robotics
Chengmin Zhou
Bingding Huang
Pasi Fränti
56
4
0
04 Feb 2021
Variance Penalized On-Policy and Off-Policy Actor-Critic
Variance Penalized On-Policy and Off-Policy Actor-Critic
Arushi Jain
Gandharv Patil
Ayush Jain
Khimya Khetarpal
Doina Precup
OffRL
55
10
0
03 Feb 2021
A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous
  Q-Learning and TD-Learning Variants
A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants
Zaiwei Chen
S. T. Maguluri
Sanjay Shakkottai
Karthikeyan Shanmugam
OffRL
213
55
0
02 Feb 2021
A Survey on Deep Reinforcement Learning for Audio-Based Applications
A Survey on Deep Reinforcement Learning for Audio-Based Applications
S. Latif
Heriberto Cuayáhuitl
Farrukh Pervez
Fahad Shamshad
Hafiz Shehbaz Ali
Min Zhang
OffRL
123
75
0
01 Jan 2021
Data Boost: Text Data Augmentation Through Reinforcement Learning Guided
  Conditional Generation
Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation
Ruibo Liu
Guangxuan Xu
Chenyan Jia
Weicheng Ma
Lili Wang
Soroush Vosoughi
82
110
0
05 Dec 2020
Pareto Deterministic Policy Gradients and Its Application in 5G Massive
  MIMO Networks
Pareto Deterministic Policy Gradients and Its Application in 5G Massive MIMO Networks
Zhou Zhou
Yan Xin
Hao Chen
C. Zhang
Lingjia Liu
8
1
0
02 Dec 2020
Diluted Near-Optimal Expert Demonstrations for Guiding Dialogue
  Stochastic Policy Optimisation
Diluted Near-Optimal Expert Demonstrations for Guiding Dialogue Stochastic Policy Optimisation
Thibault Cordier
Tanguy Urvoy
L. Rojas-Barahona
F. Lefèvre
128
5
0
25 Nov 2020
Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations
  using Generative Models
Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations using Generative Models
Yuchen Wu
Melissa Mozifian
Florian Shkurti
87
21
0
02 Nov 2020
Behavior Priors for Efficient Reinforcement Learning
Behavior Priors for Efficient Reinforcement Learning
Dhruva Tirumala
Alexandre Galashov
Hyeonwoo Noh
Leonard Hasenclever
Razvan Pascanu
...
Guillaume Desjardins
Wojciech M. Czarnecki
Arun Ahuja
Yee Whye Teh
N. Heess
114
40
0
27 Oct 2020
Optimal Off-Policy Evaluation from Multiple Logging Policies
Optimal Off-Policy Evaluation from Multiple Logging Policies
Nathan Kallus
Yuta Saito
Masatoshi Uehara
OffRL
74
40
0
21 Oct 2020
Iterative Amortized Policy Optimization
Iterative Amortized Policy Optimization
Joseph Marino
Alexandre Piché
Alessandro Davide Ialongo
Yisong Yue
OffRL
117
21
0
20 Oct 2020
Human-centric Dialog Training via Offline Reinforcement Learning
Human-centric Dialog Training via Offline Reinforcement Learning
Natasha Jaques
J. Shen
Asma Ghandeharioun
Craig Ferguson
Àgata Lapedriza
Noah J. Jones
S. Gu
Rosalind W. Picard
OffRL
84
96
0
12 Oct 2020
Reinforcement Learning with Random Delays
Reinforcement Learning with Random Delays
Simon Ramstedt
Yann Bouteiller
Giovanni Beltrame
C. Pal
Jonathan Binas
227
61
0
06 Oct 2020
Lucid Dreaming for Experience Replay: Refreshing Past States with the
  Current Policy
Lucid Dreaming for Experience Replay: Refreshing Past States with the Current Policy
Yunshu Du
Garrett A. Warnell
A. Gebremedhin
Peter Stone
Matthew E. Taylor
58
11
0
29 Sep 2020
Efficient Reinforcement Learning Development with RLzoo
Efficient Reinforcement Learning Development with RLzoo
Zihan Ding
Tianyang Yu
Yanhua Huang
Hongming Zhang
Guo Li
Quancheng Guo
Kai Zou
Hao Dong
OffRLOnRL
32
6
0
18 Sep 2020
Importance Weighted Policy Learning and Adaptation
Importance Weighted Policy Learning and Adaptation
Alexandre Galashov
Jakub Sygnowski
Guillaume Desjardins
Jan Humplik
Leonard Hasenclever
Rae Jeong
Yee Whye Teh
N. Heess
OffRL
85
1
0
10 Sep 2020
Beyond variance reduction: Understanding the true impact of baselines on
  policy optimization
Beyond variance reduction: Understanding the true impact of baselines on policy optimization
Wesley Chung
Valentin Thomas
Marlos C. Machado
Nicolas Le Roux
OffRL
114
24
0
31 Aug 2020
Machine Learning for Reliability Engineering and Safety Applications:
  Review of Current Status and Future Opportunities
Machine Learning for Reliability Engineering and Safety Applications: Review of Current Status and Future Opportunities
Zhaoyi Xu
J. Saleh
79
352
0
19 Aug 2020
Off-Policy Multi-Agent Decomposed Policy Gradients
Off-Policy Multi-Agent Decomposed Policy Gradients
Yihan Wang
Beining Han
Tonghan Wang
Heng Dong
Chongjie Zhang
100
181
0
24 Jul 2020
Hyperparameter Selection for Offline Reinforcement Learning
Hyperparameter Selection for Offline Reinforcement Learning
T. Paine
Cosmin Paduraru
Andrea Michi
Çağlar Gülçehre
Konrad Zolna
Alexander Novikov
Ziyun Wang
Nando de Freitas
GPOffRL
201
148
0
17 Jul 2020
Qgraph-bounded Q-learning: Stabilizing Model-Free Off-Policy Deep
  Reinforcement Learning
Qgraph-bounded Q-learning: Stabilizing Model-Free Off-Policy Deep Reinforcement Learning
Sabrina Hoppe
Marc Toussaint
OffRL
44
7
0
15 Jul 2020
Revisiting Fundamentals of Experience Replay
Revisiting Fundamentals of Experience Replay
W. Fedus
Prajit Ramachandran
Rishabh Agarwal
Yoshua Bengio
Hugo Larochelle
Mark Rowland
Will Dabney
KELMOffRL
97
242
0
13 Jul 2020
Off-Policy Evaluation via the Regularized Lagrangian
Off-Policy Evaluation via the Regularized Lagrangian
Mengjiao Yang
Ofir Nachum
Bo Dai
Lihong Li
Dale Schuurmans
OffRL
56
118
0
07 Jul 2020
Counterfactual Data Augmentation using Locally Factored Dynamics
Counterfactual Data Augmentation using Locally Factored Dynamics
Silviu Pitis
Elliot Creager
Animesh Garg
BDLOffRL
111
89
0
06 Jul 2020
Decentralized Deep Reinforcement Learning for Network Level Traffic
  Signal Control
Decentralized Deep Reinforcement Learning for Network Level Traffic Signal Control
Jinqiu Guo
23
1
0
02 Jul 2020
Gradient Temporal-Difference Learning with Regularized Corrections
Gradient Temporal-Difference Learning with Regularized Corrections
Sina Ghiassian
Andrew Patterson
Shivam Garg
Dhawal Gupta
Adam White
Martha White
172
42
0
01 Jul 2020
A Unifying Framework for Reinforcement Learning and Planning
A Unifying Framework for Reinforcement Learning and Planning
Thomas M. Moerland
Joost Broekens
Aske Plaat
Catholijn M. Jonker
OffRL
131
9
0
26 Jun 2020
Off-Dynamics Reinforcement Learning: Training for Transfer with Domain
  Classifiers
Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers
Benjamin Eysenbach
Swapnil Asawa
Shreyas Chaudhari
Sergey Levine
Ruslan Salakhutdinov
106
94
0
24 Jun 2020
Experience Replay with Likelihood-free Importance Weights
Experience Replay with Likelihood-free Importance Weights
Samarth Sinha
Jiaming Song
Animesh Garg
Stefano Ermon
OffRL
102
58
0
23 Jun 2020
Off-Policy Self-Critical Training for Transformer in Visual Paragraph
  Generation
Off-Policy Self-Critical Training for Transformer in Visual Paragraph Generation
Shiyang Yan
Yang Hua
N. Robertson
OffRL
42
0
0
21 Jun 2020
Self-Imitation Learning via Generalized Lower Bound Q-learning
Self-Imitation Learning via Generalized Lower Bound Q-learning
Yunhao Tang
SSL
108
24
0
12 Jun 2020
Self-Supervised Reinforcement Learning for Recommender Systems
Self-Supervised Reinforcement Learning for Recommender Systems
Xin Xin
Alexandros Karatzoglou
Ioannis Arapakis
J. Jose
SSLOffRL
160
202
0
10 Jun 2020
Previous
12345678
Next