ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.00076
  4. Cited By
Optimal Estimation of Off-Policy Policy Gradient via Double Fitted
  Iteration
v1v2v3 (latest)

Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration

31 January 2022
Chengzhuo Ni
Ruiqi Zhang
Xiang Ji
Xuezhou Zhang
Mengdi Wang
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration"

14 / 14 papers shown
Title
Optimal policy evaluation using kernel-based temporal difference methods
Optimal policy evaluation using kernel-based temporal difference methods
Yaqi Duan
Mengdi Wang
Martin J. Wainwright
OffRL
59
27
0
24 Sep 2021
Provable Benefits of Actor-Critic Methods for Offline Reinforcement
  Learning
Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
Andrea Zanette
Martin J. Wainwright
Emma Brunskill
OffRL
95
119
0
19 Aug 2021
Bellman-consistent Pessimism for Offline Reinforcement Learning
Bellman-consistent Pessimism for Offline Reinforcement Learning
Tengyang Xie
Ching-An Cheng
Nan Jiang
Paul Mineiro
Alekh Agarwal
OffRLLRM
154
278
0
13 Jun 2021
Mitigating Covariate Shift in Imitation Learning via Offline Data
  Without Great Coverage
Mitigating Covariate Shift in Imitation Learning via Offline Data Without Great Coverage
Jonathan D. Chang
Masatoshi Uehara
Dhruv Sreenivas
Rahul Kidambi
Wen Sun
OffRL
97
32
0
06 Jun 2021
Bridging Offline Reinforcement Learning and Imitation Learning: A Tale
  of Pessimism
Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism
Paria Rashidinejad
Banghua Zhu
Cong Ma
Jiantao Jiao
Stuart J. Russell
OffRL
225
289
0
22 Mar 2021
On the Convergence and Sample Efficiency of Variance-Reduced Policy
  Gradient Method
On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method
Junyu Zhang
Chengzhuo Ni
Zheng Yu
Csaba Szepesvári
Mengdi Wang
97
69
0
17 Feb 2021
Bootstrapping Fitted Q-Evaluation for Off-Policy Inference
Bootstrapping Fitted Q-Evaluation for Off-Policy Inference
Botao Hao
X. Ji
Yaqi Duan
Hao Lu
Csaba Szepesvári
Mengdi Wang
OffRL
51
40
0
06 Feb 2021
Is Pessimism Provably Efficient for Offline RL?
Is Pessimism Provably Efficient for Offline RL?
Ying Jin
Zhuoran Yang
Zhaoran Wang
OffRL
176
359
0
30 Dec 2020
Variational Policy Gradient Method for Reinforcement Learning with
  General Utilities
Variational Policy Gradient Method for Reinforcement Learning with General Utilities
Junyu Zhang
Alec Koppel
Amrit Singh Bedi
Csaba Szepesvári
Mengdi Wang
64
140
0
04 Jul 2020
Towards Optimal Off-Policy Evaluation for Reinforcement Learning with
  Marginalized Importance Sampling
Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling
Tengyang Xie
Yifei Ma
Yu Wang
OffRL
97
181
0
08 Jun 2019
Global Optimality Guarantees For Policy Gradient Methods
Global Optimality Guarantees For Policy Gradient Methods
Jalaj Bhandari
Daniel Russo
82
194
0
05 Jun 2019
An Improved Convergence Analysis of Stochastic Variance-Reduced Policy
  Gradient
An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient
Pan Xu
F. Gao
Quanquan Gu
67
97
0
29 May 2019
Off-Policy Policy Gradient with State Distribution Correction
Off-Policy Policy Gradient with State Distribution Correction
Yao Liu
Adith Swaminathan
Alekh Agarwal
Emma Brunskill
OffRL
157
67
0
17 Apr 2019
Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
Philip S. Thomas
Emma Brunskill
OffRL
432
576
0
04 Apr 2016
1