Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2007.03438
Cited By
Off-Policy Evaluation via the Regularized Lagrangian
7 July 2020
Mengjiao Yang
Ofir Nachum
Bo Dai
Lihong Li
Dale Schuurmans
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Off-Policy Evaluation via the Regularized Lagrangian"
18 / 18 papers shown
Title
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Jiawei Huang
Bingcong Li
Christoph Dann
Niao He
OffRL
161
1
0
26 Feb 2025
Finite-Sample Analysis of Proximal Gradient TD Algorithms
Bo Liu
Ji Liu
Mohammad Ghavamzadeh
Sridhar Mahadevan
Marek Petrik
43
158
0
06 Jun 2020
Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation
Yaqi Duan
Mengdi Wang
OffRL
129
151
0
21 Feb 2020
GenDICE: Generalized Offline Estimation of Stationary Values
Ruiyi Zhang
Bo Dai
Lihong Li
Dale Schuurmans
OffRL
125
173
0
21 Feb 2020
GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values
Shangtong Zhang
Bo Liu
Shimon Whiteson
OffRL
26
103
0
29 Jan 2020
AlgaeDICE: Policy Gradient from Arbitrary Experience
Ofir Nachum
Bo Dai
Ilya Kostrikov
Yinlam Chow
Lihong Li
Dale Schuurmans
OffRL
79
240
0
04 Dec 2019
Minimax Weight and Q-Function Learning for Off-Policy Evaluation
Masatoshi Uehara
Jiawei Huang
Nan Jiang
OffRL
92
186
0
28 Oct 2019
Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation
Ziyang Tang
Yihao Feng
Lihong Li
Dengyong Zhou
Qiang Liu
OffRL
88
68
0
16 Oct 2019
Faster saddle-point optimization for solving large-scale Markov decision processes
Joan Bas-Serrano
Gergely Neu
53
13
0
22 Sep 2019
Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes
Nathan Kallus
Masatoshi Uehara
OffRL
66
185
0
22 Aug 2019
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections
Ofir Nachum
Yinlam Chow
Bo Dai
Lihong Li
OffRL
87
332
0
10 Jun 2019
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
Qiang Liu
Lihong Li
Ziyang Tang
Dengyong Zhou
OffRL
91
354
0
29 Oct 2018
Scalable Bilinear
π
π
π
Learning Using State and Action Features
Yichen Chen
Lihong Li
Mengdi Wang
33
46
0
27 Apr 2018
More Robust Doubly Robust Off-policy Evaluation
Mehrdad Farajtabar
Yinlam Chow
Mohammad Ghavamzadeh
OffRL
51
267
0
10 Feb 2018
Stochastic Variance Reduction Methods for Policy Evaluation
S. Du
Jianshu Chen
Lihong Li
Lin Xiao
Dengyong Zhou
OffRL
27
156
0
25 Feb 2017
Safe and Efficient Off-Policy Reinforcement Learning
Rémi Munos
T. Stepleton
Anna Harutyunyan
Marc G. Bellemare
OffRL
119
611
0
08 Jun 2016
Doubly Robust Off-policy Value Evaluation for Reinforcement Learning
Nan Jiang
Lihong Li
OffRL
136
621
0
11 Nov 2015
Deep Reinforcement Learning with Double Q-learning
H. V. Hasselt
A. Guez
David Silver
OffRL
131
7,590
0
22 Sep 2015
1