ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.03438
  4. Cited By
Off-Policy Evaluation via the Regularized Lagrangian

Off-Policy Evaluation via the Regularized Lagrangian

7 July 2020
Mengjiao Yang
Ofir Nachum
Bo Dai
Lihong Li
Dale Schuurmans
    OffRL
ArXivPDFHTML

Papers citing "Off-Policy Evaluation via the Regularized Lagrangian"

18 / 18 papers shown
Title
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Jiawei Huang
Bingcong Li
Christoph Dann
Niao He
OffRL
161
1
0
26 Feb 2025
Finite-Sample Analysis of Proximal Gradient TD Algorithms
Finite-Sample Analysis of Proximal Gradient TD Algorithms
Bo Liu
Ji Liu
Mohammad Ghavamzadeh
Sridhar Mahadevan
Marek Petrik
43
158
0
06 Jun 2020
Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation
Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation
Yaqi Duan
Mengdi Wang
OffRL
129
151
0
21 Feb 2020
GenDICE: Generalized Offline Estimation of Stationary Values
GenDICE: Generalized Offline Estimation of Stationary Values
Ruiyi Zhang
Bo Dai
Lihong Li
Dale Schuurmans
OffRL
125
173
0
21 Feb 2020
GradientDICE: Rethinking Generalized Offline Estimation of Stationary
  Values
GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values
Shangtong Zhang
Bo Liu
Shimon Whiteson
OffRL
26
103
0
29 Jan 2020
AlgaeDICE: Policy Gradient from Arbitrary Experience
AlgaeDICE: Policy Gradient from Arbitrary Experience
Ofir Nachum
Bo Dai
Ilya Kostrikov
Yinlam Chow
Lihong Li
Dale Schuurmans
OffRL
79
240
0
04 Dec 2019
Minimax Weight and Q-Function Learning for Off-Policy Evaluation
Minimax Weight and Q-Function Learning for Off-Policy Evaluation
Masatoshi Uehara
Jiawei Huang
Nan Jiang
OffRL
92
186
0
28 Oct 2019
Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation
Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation
Ziyang Tang
Yihao Feng
Lihong Li
Dengyong Zhou
Qiang Liu
OffRL
88
68
0
16 Oct 2019
Faster saddle-point optimization for solving large-scale Markov decision
  processes
Faster saddle-point optimization for solving large-scale Markov decision processes
Joan Bas-Serrano
Gergely Neu
53
13
0
22 Sep 2019
Double Reinforcement Learning for Efficient Off-Policy Evaluation in
  Markov Decision Processes
Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes
Nathan Kallus
Masatoshi Uehara
OffRL
66
185
0
22 Aug 2019
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary
  Distribution Corrections
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections
Ofir Nachum
Yinlam Chow
Bo Dai
Lihong Li
OffRL
87
332
0
10 Jun 2019
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
Qiang Liu
Lihong Li
Ziyang Tang
Dengyong Zhou
OffRL
91
354
0
29 Oct 2018
Scalable Bilinear $π$ Learning Using State and Action Features
Scalable Bilinear πππ Learning Using State and Action Features
Yichen Chen
Lihong Li
Mengdi Wang
33
46
0
27 Apr 2018
More Robust Doubly Robust Off-policy Evaluation
More Robust Doubly Robust Off-policy Evaluation
Mehrdad Farajtabar
Yinlam Chow
Mohammad Ghavamzadeh
OffRL
51
267
0
10 Feb 2018
Stochastic Variance Reduction Methods for Policy Evaluation
Stochastic Variance Reduction Methods for Policy Evaluation
S. Du
Jianshu Chen
Lihong Li
Lin Xiao
Dengyong Zhou
OffRL
27
156
0
25 Feb 2017
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learning
Rémi Munos
T. Stepleton
Anna Harutyunyan
Marc G. Bellemare
OffRL
119
611
0
08 Jun 2016
Doubly Robust Off-policy Value Evaluation for Reinforcement Learning
Doubly Robust Off-policy Value Evaluation for Reinforcement Learning
Nan Jiang
Lihong Li
OffRL
136
621
0
11 Nov 2015
Deep Reinforcement Learning with Double Q-learning
Deep Reinforcement Learning with Double Q-learning
H. V. Hasselt
A. Guez
David Silver
OffRL
131
7,590
0
22 Sep 2015
1