ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1702.03006
  4. Cited By
Multi-step Off-policy Learning Without Importance Sampling Ratios

Multi-step Off-policy Learning Without Importance Sampling Ratios

9 February 2017
A. R. Mahmood
Huizhen Yu
R. Sutton
    OffRL
ArXivPDFHTML

Papers citing "Multi-step Off-policy Learning Without Importance Sampling Ratios"

31 / 31 papers shown
Title
Two-Step Q-Learning
Two-Step Q-Learning
Antony Vijesh
Shreyas Sumithra Rudresha
OffRL
16
1
0
02 Jul 2024
Demystifying the Recency Heuristic in Temporal-Difference Learning
Demystifying the Recency Heuristic in Temporal-Difference Learning
Brett Daley
Marlos C. Machado
Martha White
28
1
0
18 Jun 2024
Target Networks and Over-parameterization Stabilize Off-policy
  Bootstrapping with Function Approximation
Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation
Fengdi Che
Chenjun Xiao
Jincheng Mei
Bo Dai
Ramki Gummadi
Oscar A Ramirez
Christopher K Harris
A. R. Mahmood
Dale Schuurmans
38
5
0
31 May 2024
Analysis of Off-Policy Multi-Step TD-Learning with Linear Function
  Approximation
Analysis of Off-Policy Multi-Step TD-Learning with Linear Function Approximation
Donghwan Lee
45
0
0
24 Feb 2024
$K$-Nearest-Neighbor Resampling for Off-Policy Evaluation in Stochastic
  Control
KKK-Nearest-Neighbor Resampling for Off-Policy Evaluation in Stochastic Control
Michael Giegrich
Roel Oomen
C. Reisinger
OffRL
27
2
0
07 Jun 2023
The Statistical Benefits of Quantile Temporal-Difference Learning for
  Value Estimation
The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation
Mark Rowland
Yunhao Tang
Clare Lyle
Rémi Munos
Marc G. Bellemare
Will Dabney
13
10
0
28 May 2023
Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement
  Learning
Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning
Brett Daley
Martha White
Chris Amato
Marlos C. Machado
OffRL
16
3
0
26 Jan 2023
The Nature of Temporal Difference Errors in Multi-step Distributional
  Reinforcement Learning
The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning
Yunhao Tang
Mark Rowland
Rémi Munos
Bernardo Avila-Pires
Will Dabney
Marc G. Bellemare
OffRL
22
11
0
15 Jul 2022
Importance Sampling Placement in Off-Policy Temporal-Difference Methods
Importance Sampling Placement in Off-Policy Temporal-Difference Methods
Eric Graves
Sina Ghiassian
OffRL
21
2
0
18 Mar 2022
Continual Auxiliary Task Learning
Continual Auxiliary Task Learning
Matt McLeod
Chun-Ping Lo
M. Schlegel
Andrew Jacobsen
Raksha Kumaraswamy
Martha White
Adam White
CLL
21
8
0
22 Feb 2022
Improving the Efficiency of Off-Policy Reinforcement Learning by
  Accounting for Past Decisions
Improving the Efficiency of Off-Policy Reinforcement Learning by Accounting for Past Decisions
Brett Daley
Chris Amato
OffRL
23
1
0
23 Dec 2021
An Empirical Comparison of Off-policy Prediction Learning Algorithms in
  the Four Rooms Environment
An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment
Sina Ghiassian
R. Sutton
AAML
OffRL
11
6
0
10 Sep 2021
Learning Expected Emphatic Traces for Deep RL
Learning Expected Emphatic Traces for Deep RL
Ray Jiang
Shangtong Zhang
Veronica Chelu
Adam White
Hado van Hasselt
OffRL
19
12
0
12 Jul 2021
Emphatic Algorithms for Deep Reinforcement Learning
Emphatic Algorithms for Deep Reinforcement Learning
Ray Jiang
Tom Zahavy
Zhongwen Xu
Adam White
Matteo Hessel
Charles Blundell
Hado van Hasselt
OffRL
36
19
0
21 Jun 2021
A Deep Reinforcement Learning Approach to Marginalized Importance
  Sampling with the Successor Representation
A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation
Scott Fujimoto
D. Meger
Doina Precup
6
16
0
12 Jun 2021
An Empirical Comparison of Off-policy Prediction Learning Algorithms on
  the Collision Task
An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task
Sina Ghiassian
R. Sutton
AAML
OffRL
17
5
0
02 Jun 2021
On Convergence of Gradient Expected Sarsa($λ$)
On Convergence of Gradient Expected Sarsa(λλλ)
Long Yang
Gang Zheng
Yu Zhang
Qian Zheng
Pengfei Li
Gang Pan
21
2
0
14 Dec 2020
Gradient Temporal-Difference Learning with Regularized Corrections
Gradient Temporal-Difference Learning with Regularized Corrections
Sina Ghiassian
Andrew Patterson
Shivam Garg
Dhawal Gupta
Adam White
Martha White
8
42
0
01 Jul 2020
Self-Imitation Learning via Generalized Lower Bound Q-learning
Self-Imitation Learning via Generalized Lower Bound Q-learning
Yunhao Tang
SSL
33
24
0
12 Jun 2020
Adaptive Trade-Offs in Off-Policy Learning
Adaptive Trade-Offs in Off-Policy Learning
Mark Rowland
Will Dabney
Rémi Munos
OffRL
25
22
0
16 Oct 2019
Expected Sarsa($λ$) with Control Variate for Variance Reduction
Expected Sarsa(λλλ) with Control Variate for Variance Reduction
Long Yang
Yu Zhang
Jun Wen
Qian Zheng
Pengfei Li
Gang Pan
22
0
0
25 Jun 2019
Importance Resampling for Off-policy Prediction
Importance Resampling for Off-policy Prediction
M. Schlegel
Wesley Chung
Daniel Graves
Jian Qian
Martha White
OffRL
6
41
0
11 Jun 2019
Policy Certificates: Towards Accountable Reinforcement Learning
Policy Certificates: Towards Accountable Reinforcement Learning
Christoph Dann
Ashutosh Adhikari
Wei Wei
Jimmy J. Lin
OffRL
4
139
0
07 Nov 2018
Online Off-policy Prediction
Online Off-policy Prediction
Sina Ghiassian
D. Paul
M. Fasoulakis
R. Sutton
Adam White
OffRL
6
23
0
06 Nov 2018
Per-decision Multi-step Temporal Difference Learning with Control
  Variates
Per-decision Multi-step Temporal Difference Learning with Control Variates
Kristopher De Asis
R. Sutton
14
7
0
05 Jul 2018
Evolution-Guided Policy Gradient in Reinforcement Learning
Evolution-Guided Policy Gradient in Reinforcement Learning
Shauharda Khadka
Kagan Tumer
19
223
0
21 May 2018
Addressing Function Approximation Error in Actor-Critic Methods
Addressing Function Approximation Error in Actor-Critic Methods
Scott Fujimoto
H. V. Hoof
D. Meger
OffRL
17
5,056
0
26 Feb 2018
On Convergence of some Gradient-based Temporal-Differences Algorithms
  for Off-Policy Learning
On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning
Huizhen Yu
OffRL
14
32
0
27 Dec 2017
Learning with Options that Terminate Off-Policy
Learning with Options that Terminate Off-Policy
Anna Harutyunyan
Peter Vrancx
Pierre-Luc Bacon
Doina Precup
A. Nowé
OffRL
22
28
0
10 Nov 2017
Convergent Tree Backup and Retrace with Function Approximation
Convergent Tree Backup and Retrace with Function Approximation
Ahmed Touati
Pierre-Luc Bacon
Doina Precup
Pascal Vincent
20
40
0
25 May 2017
On Generalized Bellman Equations and Temporal-Difference Learning
On Generalized Bellman Equations and Temporal-Difference Learning
Huizhen Yu
A. R. Mahmood
R. Sutton
15
29
0
14 Apr 2017
1