ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.02647
  4. Cited By
Safe and Efficient Off-Policy Reinforcement Learning
v1v2 (latest)

Safe and Efficient Off-Policy Reinforcement Learning

8 June 2016
Rémi Munos
T. Stepleton
Anna Harutyunyan
Marc G. Bellemare
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Safe and Efficient Off-Policy Reinforcement Learning"

50 / 374 papers shown
Title
Ensemble-MIX: Enhancing Sample Efficiency in Multi-Agent RL Using Ensemble Methods
Ensemble-MIX: Enhancing Sample Efficiency in Multi-Agent RL Using Ensemble Methods
Tom Danino
Nahum Shimkin
55
0
0
03 Jun 2025
ShiQ: Bringing back Bellman to LLMs
ShiQ: Bringing back Bellman to LLMs
Pierre Clavier
Nathan Grinsztajn
Raphaël Avalos
Yannis Flet-Berliac
Irem Ergun
...
Eugene Tarassov
Olivier Pietquin
Pierre Harvey Richemond
Florian Strub
Matthieu Geist
OffRL
64
0
0
16 May 2025
Automatic Reward Shaping from Confounded Offline Data
Automatic Reward Shaping from Confounded Offline Data
Mingxuan Li
Junzhe Zhang
Elias Bareinboim
OffRLOnRL
108
0
0
16 May 2025
Trust-Region Twisted Policy Improvement
Trust-Region Twisted Policy Improvement
Joery A. de Vries
Jinke He
Yaniv Oren
M. Spaan
OffRLLRM
133
0
0
08 Apr 2025
Time After Time: Deep-Q Effect Estimation for Interventions on When and What to do
Time After Time: Deep-Q Effect Estimation for Interventions on When and What to do
Yoav Wald
M. Goldstein
Yonathan Efroni
Wouter A. C. van Amsterdam
Rajesh Ranganath
CML
179
0
0
20 Mar 2025
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Nicolas Le Roux
Marc G. Bellemare
Jonathan Lebensold
Arnaud Bergeron
Joshua Greaves
Alex Fréchette
Carolyne Pelletier
Eric Thibodeau-Laufer
Sándor Toth
Sam Work
OffRL
172
6
0
18 Mar 2025
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
Taiyi Wang
Zhihao Wu
Jianheng Liu
Jianye Hao
Jun Wang
Kun Shao
OffRL
122
29
0
24 Feb 2025
Actor Critic with Experience Replay-based automatic treatment planning for prostate cancer intensity modulated radiotherapy
Actor Critic with Experience Replay-based automatic treatment planning for prostate cancer intensity modulated radiotherapy
Md Mainul Abrar
Parvat Sapkota
Damon Sprouts
Xun Jia
Yujie Chi
OffRL
63
0
0
01 Feb 2025
Divergence-Augmented Policy Optimization
Qing Wang
Yingru Li
Jiechao Xiong
Tong Zhang
OffRL
174
16
0
28 Jan 2025
GraCo -- A Graph Composer for Integrated Circuits
GraCo -- A Graph Composer for Integrated Circuits
Stefan Uhlich
Andrea Bonetti
Arun Venkitaraman
Ali Momeni
Ryoga Matsuo
Chia-Yu Hsieh
Eisaku Ohbuchi
Lorenzo Servadei
GNN
155
2
0
21 Nov 2024
A Safety Modulator Actor-Critic Method in Model-Free Safe Reinforcement
  Learning and Application in UAV Hovering
A Safety Modulator Actor-Critic Method in Model-Free Safe Reinforcement Learning and Application in UAV Hovering
Qihan Qi
Xinsong Yang
Gang Xia
Daniel W. C. Ho
Pengyang Tang
94
0
0
09 Oct 2024
Compatible Gradient Approximations for Actor-Critic Algorithms
Compatible Gradient Approximations for Actor-Critic Algorithms
Baturay Saglam
Dionysis Kalogerias
134
0
0
02 Sep 2024
Simplifying Deep Temporal Difference Learning
Simplifying Deep Temporal Difference Learning
Matteo Gallici
Mattie Fellows
Benjamin Ellis
B. Pou
Ivan Masmitja
Jakob Foerster
Mario Martin
OffRL
165
26
0
05 Jul 2024
Two-Step Q-Learning
Two-Step Q-Learning
Antony Vijesh
Shreyas Sumithra Rudresha
OffRL
93
1
0
02 Jul 2024
Demystifying the Recency Heuristic in Temporal-Difference Learning
Demystifying the Recency Heuristic in Temporal-Difference Learning
Brett Daley
Marlos C. Machado
Martha White
72
1
0
18 Jun 2024
WPO: Enhancing RLHF with Weighted Preference Optimization
WPO: Enhancing RLHF with Weighted Preference Optimization
Wenxuan Zhou
Ravi Agrawal
Shujian Zhang
Sathish Indurthi
Sanqiang Zhao
Kaiqiang Song
Silei Xu
Chenguang Zhu
105
20
0
17 Jun 2024
Transcendence: Generative Models Can Outperform The Experts That Train
  Them
Transcendence: Generative Models Can Outperform The Experts That Train Them
Edwin Zhang
Vincent Zhu
Naomi Saphra
Anat Kleiman
Benjamin L. Edelman
Milind Tambe
Sham Kakade
Eran Malach
120
15
0
17 Jun 2024
Reflective Policy Optimization
Reflective Policy Optimization
Yaozhong Gan
Renye Yan
Zhe Wu
Junliang Xing
84
1
0
06 Jun 2024
Enhancing Efficiency of Safe Reinforcement Learning via Sample
  Manipulation
Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation
Shangding Gu
Laixi Shi
Yuhao Ding
Alois Knoll
C. Spanos
Adam Wierman
Ming Jin
OffRL
88
2
0
31 May 2024
OMPO: A Unified Framework for RL under Policy and Dynamics Shifts
OMPO: A Unified Framework for RL under Policy and Dynamics Shifts
Yu-Juan Luo
Tianying Ji
Gang Hua
Jianwei Zhang
Huazhe Xu
Xianyuan Zhan
OffRL
111
3
0
29 May 2024
Kernel Metric Learning for In-Sample Off-Policy Evaluation of
  Deterministic RL Policies
Kernel Metric Learning for In-Sample Off-Policy Evaluation of Deterministic RL Policies
Haanvid Lee
Tri Wahyu Guntara
Jongmin Lee
Yung-Kyun Noh
Kee-Eung Kim
OffRL
64
1
0
29 May 2024
Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical
  Behaviors in Deep Off-Policy RL
Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL
Yu-Juan Luo
Tianying Ji
Gang Hua
Jianwei Zhang
Huazhe Xu
Xianyuan Zhan
OffRLOnRL
108
3
0
28 May 2024
Highway Reinforcement Learning
Highway Reinforcement Learning
Yuhui Wang
M. Strupl
Francesco Faccio
Qingyuan Wu
Haozhe Liu
Michal Grudzieñ
Xiaoyang Tan
Jürgen Schmidhuber
OffRL
73
4
0
28 May 2024
Deep Dive into Model-free Reinforcement Learning for Biological and
  Robotic Systems: Theory and Practice
Deep Dive into Model-free Reinforcement Learning for Biological and Robotic Systems: Theory and Practice
Yusheng Jiao
Feng Ling
Sina Heydari
N. Heess
J. Merel
Eva Kanso
64
1
0
19 May 2024
Towards Robust Policy: Enhancing Offline Reinforcement Learning with
  Adversarial Attacks and Defenses
Towards Robust Policy: Enhancing Offline Reinforcement Learning with Adversarial Attacks and Defenses
Thanh Nguyen
Tung M. Luu
Tri Ton
Chang D. Yoo
OffRLAAML
84
0
0
18 May 2024
Adaptive Exploration for Data-Efficient General Value Function
  Evaluations
Adaptive Exploration for Data-Efficient General Value Function Evaluations
Arushi Jain
Josiah P. Hanna
Doina Precup
59
2
0
13 May 2024
Shared learning of powertrain control policies for vehicle fleets
Shared learning of powertrain control policies for vehicle fleets
Lindsey Kerbel
B. Ayalew
Andrej Ivanco
72
1
0
27 Apr 2024
Conflict-Averse Gradient Aggregation for Constrained Multi-Objective
  Reinforcement Learning
Conflict-Averse Gradient Aggregation for Constrained Multi-Objective Reinforcement Learning
Dohyeong Kim
Mineui Hong
Jeongho Park
Songhwai Oh
76
0
0
01 Mar 2024
Offline Fictitious Self-Play for Competitive Games
Offline Fictitious Self-Play for Competitive Games
Jingxiao Chen
Weiji Xie
Weinan Zhang
Yong Zu
Ying Wen
OffRL
79
0
0
29 Feb 2024
Skill or Luck? Return Decomposition via Advantage Functions
Skill or Luck? Return Decomposition via Advantage Functions
Hsiao-Ru Pan
Bernhard Schölkopf
OffRL
40
5
0
20 Feb 2024
Discovering Command and Control (C2) Channels on Tor and Public Networks
  Using Reinforcement Learning
Discovering Command and Control (C2) Channels on Tor and Public Networks Using Reinforcement Learning
Cheng Wang
Christopher Redino
Abdul Rahman
Ryan Clark
Dan Radke
Tyler Cody
Dhruv Nandakumar
Edward Bowen
59
3
0
14 Feb 2024
Off-policy Distributional Q($λ$): Distributional RL without
  Importance Sampling
Off-policy Distributional Q(λλλ): Distributional RL without Importance Sampling
Yunhao Tang
Mark Rowland
Rémi Munos
Bernardo Avila-Pires
Will Dabney
OffRL
60
1
0
08 Feb 2024
Deep autoregressive density nets vs neural ensembles for model-based
  offline reinforcement learning
Deep autoregressive density nets vs neural ensembles for model-based offline reinforcement learning
Abdelhakim Benechehab
Albert Thomas
Balázs Kégl
OffRL
67
2
0
05 Feb 2024
Towards Off-Policy Reinforcement Learning for Ranking Policies with
  Human Feedback
Towards Off-Policy Reinforcement Learning for Ranking Policies with Human Feedback
Teng Xiao
Suhang Wang
OffRL
73
8
0
17 Jan 2024
Neural Population Learning beyond Symmetric Zero-sum Games
Neural Population Learning beyond Symmetric Zero-sum Games
Siqi Liu
Luke Marris
Marc Lanctot
Georgios Piliouras
Joel Z Leibo
N. Heess
MLT
95
3
0
10 Jan 2024
Tight Finite Time Bounds of Two-Time-Scale Linear Stochastic Approximation with Markovian Noise
Tight Finite Time Bounds of Two-Time-Scale Linear Stochastic Approximation with Markovian Noise
Shaan ul Haque
S. Khodadadian
S. T. Maguluri
138
11
0
31 Dec 2023
TAPE: Leveraging Agent Topology for Cooperative Multi-Agent Policy
  Gradient
TAPE: Leveraging Agent Topology for Cooperative Multi-Agent Policy Gradient
Xingzhou Lou
Junge Zhang
Timothy J. Norman
Kaiqi Huang
Yali Du
70
1
0
25 Dec 2023
Probabilistic Offline Policy Ranking with Approximate Bayesian
  Computation
Probabilistic Offline Policy Ranking with Approximate Bayesian Computation
Longchao Da
Porter Jenkins
Trevor Schwantes
Jeffrey Dotson
Hua Wei
OffRL
54
2
0
17 Dec 2023
Stochastic Optimal Control Matching
Stochastic Optimal Control Matching
Carles Domingo-Enrich
Jiequn Han
Brandon Amos
Joan Bruna
Ricky T. Q. Chen
DiffM
122
10
0
04 Dec 2023
Bias Resilient Multi-Step Off-Policy Goal-Conditioned Reinforcement
  Learning
Bias Resilient Multi-Step Off-Policy Goal-Conditioned Reinforcement Learning
Lisheng Wu
Ke Chen
64
0
0
29 Nov 2023
Building Persona Consistent Dialogue Agents with Offline Reinforcement
  Learning
Building Persona Consistent Dialogue Agents with Offline Reinforcement Learning
Ryan Shea
Zhou Yu
OffRL
97
8
0
16 Oct 2023
Distill Knowledge in Multi-task Reinforcement Learning with
  Optimal-Transport Regularization
Distill Knowledge in Multi-task Reinforcement Learning with Optimal-Transport Regularization
Bang Giang Le
Viet-Cuong Ta
OT
81
1
0
27 Sep 2023
Hybrid of representation learning and reinforcement learning for dynamic
  and complex robotic motion planning
Hybrid of representation learning and reinforcement learning for dynamic and complex robotic motion planning
Chengmin Zhou
Xin Lu
Jiapeng Dai
Bingding Huang
Xiaoxu Liu
Pasi Fränti
73
2
0
07 Sep 2023
Counterfactual Explanation Policies in RL
Counterfactual Explanation Policies in RL
Shripad Deshmukh
R Srivatsan
Supriti Vijay
Jayakumar Subramanian
Chirag Agarwal
OffRL
59
0
0
25 Jul 2023
Eigensubspace of Temporal-Difference Dynamics and How It Improves Value
  Approximation in Reinforcement Learning
Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning
Qiang He
Dinesh Manocha
Meng Fang
S. Maghsudi
76
5
0
29 Jun 2023
Value-aware Importance Weighting for Off-policy Reinforcement Learning
Value-aware Importance Weighting for Off-policy Reinforcement Learning
Kristopher De Asis
Eric Graves
R. Sutton
OffRL
58
1
0
27 Jun 2023
Bootstrapped Representations in Reinforcement Learning
Bootstrapped Representations in Reinforcement Learning
Charline Le Lan
Stephen Tu
Mark Rowland
Anna Harutyunyan
Rishabh Agarwal
Marc G. Bellemare
Will Dabney
OffRLOODSSL
138
10
0
16 Jun 2023
Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at
  100k Steps-Per-Second
Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-Per-Second
Vincent-Pierre Berges
Andrew Szot
Devendra Singh Chaplot
Aaron Gokaslan
Roozbeh Mottaghi
Dhruv Batra
Eric Undersander
LRMLM&Ro
96
5
0
13 Jun 2023
DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm
DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm
Yunhao Tang
Tadashi Kozuno
Mark Rowland
Anna Harutyunyan
Rémi Munos
Bernardo Avila-Pires
Michal Valko
27
0
0
29 May 2023
Learnable Behavior Control: Breaking Atari Human World Records via
  Sample-Efficient Behavior Selection
Learnable Behavior Control: Breaking Atari Human World Records via Sample-Efficient Behavior Selection
Jiajun Fan
Yuzheng Zhuang
Yuecheng Liu
Jianye Hao
Bin Wang
Jiangcheng Zhu
Hao Wang
Shutao Xia
72
18
0
09 May 2023
12345678
Next