Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1702.03006
Cited By
Multi-step Off-policy Learning Without Importance Sampling Ratios
9 February 2017
A. R. Mahmood
Huizhen Yu
R. Sutton
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multi-step Off-policy Learning Without Importance Sampling Ratios"
31 / 31 papers shown
Title
Two-Step Q-Learning
Antony Vijesh
Shreyas Sumithra Rudresha
OffRL
16
1
0
02 Jul 2024
Demystifying the Recency Heuristic in Temporal-Difference Learning
Brett Daley
Marlos C. Machado
Martha White
30
1
0
18 Jun 2024
Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation
Fengdi Che
Chenjun Xiao
Jincheng Mei
Bo Dai
Ramki Gummadi
Oscar A Ramirez
Christopher K Harris
A. R. Mahmood
Dale Schuurmans
38
5
0
31 May 2024
Analysis of Off-Policy Multi-Step TD-Learning with Linear Function Approximation
Donghwan Lee
45
0
0
24 Feb 2024
K
K
K
-Nearest-Neighbor Resampling for Off-Policy Evaluation in Stochastic Control
Michael Giegrich
Roel Oomen
C. Reisinger
OffRL
27
2
0
07 Jun 2023
The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation
Mark Rowland
Yunhao Tang
Clare Lyle
Rémi Munos
Marc G. Bellemare
Will Dabney
13
10
0
28 May 2023
Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning
Brett Daley
Martha White
Chris Amato
Marlos C. Machado
OffRL
19
3
0
26 Jan 2023
The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning
Yunhao Tang
Mark Rowland
Rémi Munos
Bernardo Avila-Pires
Will Dabney
Marc G. Bellemare
OffRL
24
11
0
15 Jul 2022
Importance Sampling Placement in Off-Policy Temporal-Difference Methods
Eric Graves
Sina Ghiassian
OffRL
23
2
0
18 Mar 2022
Continual Auxiliary Task Learning
Matt McLeod
Chun-Ping Lo
M. Schlegel
Andrew Jacobsen
Raksha Kumaraswamy
Martha White
Adam White
CLL
24
8
0
22 Feb 2022
Improving the Efficiency of Off-Policy Reinforcement Learning by Accounting for Past Decisions
Brett Daley
Chris Amato
OffRL
23
1
0
23 Dec 2021
An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment
Sina Ghiassian
R. Sutton
AAML
OffRL
13
6
0
10 Sep 2021
Learning Expected Emphatic Traces for Deep RL
Ray Jiang
Shangtong Zhang
Veronica Chelu
Adam White
Hado van Hasselt
OffRL
19
12
0
12 Jul 2021
Emphatic Algorithms for Deep Reinforcement Learning
Ray Jiang
Tom Zahavy
Zhongwen Xu
Adam White
Matteo Hessel
Charles Blundell
Hado van Hasselt
OffRL
38
19
0
21 Jun 2021
A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation
Scott Fujimoto
D. Meger
Doina Precup
8
16
0
12 Jun 2021
An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task
Sina Ghiassian
R. Sutton
AAML
OffRL
19
5
0
02 Jun 2021
On Convergence of Gradient Expected Sarsa(
λ
λ
λ
)
Long Yang
Gang Zheng
Yu Zhang
Qian Zheng
Pengfei Li
Gang Pan
21
2
0
14 Dec 2020
Gradient Temporal-Difference Learning with Regularized Corrections
Sina Ghiassian
Andrew Patterson
Shivam Garg
Dhawal Gupta
Adam White
Martha White
10
42
0
01 Jul 2020
Self-Imitation Learning via Generalized Lower Bound Q-learning
Yunhao Tang
SSL
33
24
0
12 Jun 2020
Adaptive Trade-Offs in Off-Policy Learning
Mark Rowland
Will Dabney
Rémi Munos
OffRL
25
22
0
16 Oct 2019
Expected Sarsa(
λ
λ
λ
) with Control Variate for Variance Reduction
Long Yang
Yu Zhang
Jun Wen
Qian Zheng
Pengfei Li
Gang Pan
22
0
0
25 Jun 2019
Importance Resampling for Off-policy Prediction
M. Schlegel
Wesley Chung
Daniel Graves
Jian Qian
Martha White
OffRL
6
41
0
11 Jun 2019
Policy Certificates: Towards Accountable Reinforcement Learning
Christoph Dann
Ashutosh Adhikari
Wei Wei
Jimmy J. Lin
OffRL
6
140
0
07 Nov 2018
Online Off-policy Prediction
Sina Ghiassian
D. Paul
M. Fasoulakis
R. Sutton
Adam White
OffRL
8
23
0
06 Nov 2018
Per-decision Multi-step Temporal Difference Learning with Control Variates
Kristopher De Asis
R. Sutton
14
7
0
05 Jul 2018
Evolution-Guided Policy Gradient in Reinforcement Learning
Shauharda Khadka
Kagan Tumer
19
223
0
21 May 2018
Addressing Function Approximation Error in Actor-Critic Methods
Scott Fujimoto
H. V. Hoof
D. Meger
OffRL
19
5,063
0
26 Feb 2018
On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning
Huizhen Yu
OffRL
16
32
0
27 Dec 2017
Learning with Options that Terminate Off-Policy
Anna Harutyunyan
Peter Vrancx
Pierre-Luc Bacon
Doina Precup
A. Nowé
OffRL
24
28
0
10 Nov 2017
Convergent Tree Backup and Retrace with Function Approximation
Ahmed Touati
Pierre-Luc Bacon
Doina Precup
Pascal Vincent
22
40
0
25 May 2017
On Generalized Bellman Equations and Temporal-Difference Learning
Huizhen Yu
A. R. Mahmood
R. Sutton
17
29
0
14 Apr 2017
1