ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2101.02808
  4. Cited By
Average-Reward Off-Policy Policy Evaluation with Function Approximation
v1v2v3 (latest)

Average-Reward Off-Policy Policy Evaluation with Function Approximation

8 January 2021
Shangtong Zhang
Yi Wan
R. Sutton
Shimon Whiteson
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Average-Reward Off-Policy Policy Evaluation with Function Approximation"

21 / 21 papers shown
Title
Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features
Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features
Zixuan Xie
Xinyu Liu
Rohan Chandra
Shangtong Zhang
45
0
0
27 May 2025
Learning and Planning in Average-Reward Markov Decision Processes
Learning and Planning in Average-Reward Markov Decision Processes
Yi Wan
A. Naik
R. Sutton
OffRL
69
61
0
29 Jun 2020
MOReL : Model-Based Offline Reinforcement Learning
MOReL : Model-Based Offline Reinforcement Learning
Rahul Kidambi
Aravind Rajeswaran
Praneeth Netrapalli
Thorsten Joachims
OffRL
111
677
0
12 May 2020
Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning
Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning
Shangtong Zhang
Bo Liu
Shimon Whiteson
92
38
0
22 Apr 2020
GradientDICE: Rethinking Generalized Offline Estimation of Stationary
  Values
GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values
Shangtong Zhang
Bo Liu
Shimon Whiteson
OffRL
97
103
0
29 Jan 2020
AlgaeDICE: Policy Gradient from Arbitrary Experience
AlgaeDICE: Policy Gradient from Arbitrary Experience
Ofir Nachum
Bo Dai
Ilya Kostrikov
Yinlam Chow
Lihong Li
Dale Schuurmans
OffRL
166
244
0
04 Dec 2019
A Convergent Off-Policy Temporal Difference Algorithm
A Convergent Off-Policy Temporal Difference Algorithm
Raghuram Bharadwaj Diddigi
Chandramouli Kamanchi
S. Bhatnagar
OffRL
37
8
0
13 Nov 2019
Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation
Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation
Ziyang Tang
Yihao Feng
Lihong Li
Dengyong Zhou
Qiang Liu
OffRL
164
69
0
16 Oct 2019
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary
  Distribution Corrections
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections
Ofir Nachum
Yinlam Chow
Bo Dai
Lihong Li
OffRL
155
338
0
10 Jun 2019
Towards Optimal Off-Policy Evaluation for Reinforcement Learning with
  Marginalized Importance Sampling
Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling
Tengyang Xie
Yifei Ma
Yu Wang
OffRL
115
181
0
08 Jun 2019
Combining Parametric and Nonparametric Models for Off-Policy Evaluation
Combining Parametric and Nonparametric Models for Off-Policy Evaluation
Omer Gottesman
Yao Liu
Scott Sussex
Emma Brunskill
Finale Doshi-Velez
OffRL
100
36
0
14 May 2019
Challenges of Real-World Reinforcement Learning
Challenges of Real-World Reinforcement Learning
Gabriel Dulac-Arnold
D. Mankowitz
Todd Hester
OffRL
99
551
0
29 Apr 2019
Planning with Expectation Models
Planning with Expectation Models
Yi Wan
M. Zaheer
Adam White
Martha White
R. Sutton
OffRL
76
24
0
02 Apr 2019
Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate
  Shift
Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift
Carles Gelada
Marc G. Bellemare
OffRL
73
99
0
27 Jan 2019
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
Qiang Liu
Lihong Li
Ziyang Tang
Dengyong Zhou
OffRL
177
356
0
29 Oct 2018
Addressing Function Approximation Error in Actor-Critic Methods
Addressing Function Approximation Error in Actor-Critic Methods
Scott Fujimoto
H. V. Hoof
David Meger
OffRL
200
5,226
0
26 Feb 2018
On Convergence of some Gradient-based Temporal-Differences Algorithms
  for Off-Policy Learning
On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning
Huizhen Yu
OffRL
99
32
0
27 Dec 2017
Deep Reinforcement Learning that Matters
Deep Reinforcement Learning that Matters
Peter Henderson
Riashat Islam
Philip Bachman
Joelle Pineau
Doina Precup
David Meger
OffRL
147
1,963
0
19 Sep 2017
An Emphatic Approach to the Problem of Off-policy Temporal-Difference
  Learning
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning
R. Sutton
A. R. Mahmood
Martha White
98
272
0
14 Mar 2015
Distributed Policy Evaluation Under Multiple Behavior Strategies
Distributed Policy Evaluation Under Multiple Behavior Strategies
Sergio Valcarcel Macua
Jianshu Chen
S. Zazo
Ali H. Sayed
130
104
0
30 Dec 2013
Dyna-Style Planning with Linear Function Approximation and Prioritized
  Sweeping
Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping
R. Sutton
Csaba Szepesvári
A. Geramifard
Michael Bowling
OffRL
98
204
0
13 Jun 2012
1