ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.12429
  4. Cited By
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation

Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation

29 October 2018
Qiang Liu
Lihong Li
Ziyang Tang
Dengyong Zhou
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation"

50 / 252 papers shown
Title
On the Convergence Rate of Off-Policy Policy Optimization Methods with
  Density-Ratio Correction
On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio Correction
Jiawei Huang
Nan Jiang
98
5
0
02 Jun 2021
The Power of Log-Sum-Exp: Sequential Density Ratio Matrix Estimation for
  Speed-Accuracy Optimization
The Power of Log-Sum-Exp: Sequential Density Ratio Matrix Estimation for Speed-Accuracy Optimization
Taiki Miyagawa
Akinori F. Ebihara
50
3
0
28 May 2021
On Instrumental Variable Regression for Deep Offline Policy Evaluation
On Instrumental Variable Regression for Deep Offline Policy Evaluation
Yutian Chen
Liyuan Xu
Çağlar Gülçehre
T. Paine
Arthur Gretton
Nando de Freitas
Arnaud Doucet
OffRL
117
18
0
21 May 2021
Optimal Uniform OPE and Model-based Offline Reinforcement Learning in
  Time-Homogeneous, Reward-Free and Task-Agnostic Settings
Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings
Ming Yin
Yu Wang
OffRL
97
19
0
13 May 2021
Deeply-Debiased Off-Policy Interval Estimation
Deeply-Debiased Off-Policy Interval Estimation
C. Shi
Runzhe Wan
Victor Chernozhukov
R. Song
OffRL
53
38
0
10 May 2021
Towards Theoretical Understandings of Robust Markov Decision Processes:
  Sample Complexity and Asymptotics
Towards Theoretical Understandings of Robust Markov Decision Processes: Sample Complexity and Asymptotics
Wenhao Yang
Liangyu Zhang
Zhihua Zhang
72
35
0
09 May 2021
Autoregressive Dynamics Models for Offline Policy Evaluation and
  Optimization
Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization
Michael Ruogu Zhang
T. Paine
Ofir Nachum
Cosmin Paduraru
George Tucker
Ziyun Wang
Mohammad Norouzi
OffRL
86
49
0
28 Apr 2021
Universal Off-Policy Evaluation
Universal Off-Policy Evaluation
Yash Chandak
S. Niekum
Bruno C. da Silva
Erik Learned-Miller
Emma Brunskill
Philip S. Thomas
OffRLELM
101
53
0
26 Apr 2021
Nearly Horizon-Free Offline Reinforcement Learning
Nearly Horizon-Free Offline Reinforcement Learning
Zhaolin Ren
Jialian Li
Bo Dai
S. Du
Sujay Sanghavi
OffRL
83
49
0
25 Mar 2021
Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and
  Dual Bounds
Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and Dual Bounds
Yihao Feng
Ziyang Tang
Na Zhang
Qiang Liu
OffRL
73
14
0
09 Mar 2021
Instabilities of Offline RL with Pre-Trained Neural Representation
Instabilities of Offline RL with Pre-Trained Neural Representation
Ruosong Wang
Yifan Wu
Ruslan Salakhutdinov
Sham Kakade
OffRL
155
42
0
08 Mar 2021
On the Convergence and Optimality of Policy Gradient for Markov Coherent
  Risk
On the Convergence and Optimality of Policy Gradient for Markov Coherent Risk
Audrey Huang
Liu Leqi
Zachary Chase Lipton
Kamyar Azizzadenesheli
77
21
0
04 Mar 2021
Minimax Model Learning
Minimax Model Learning
Cameron Voloshin
Nan Jiang
Yisong Yue
OffRL
112
18
0
02 Mar 2021
Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality
Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality
Tengyu Xu
Zhuoran Yang
Zhaoran Wang
Yingbin Liang
OffRL
104
25
0
23 Feb 2021
Bootstrapping Fitted Q-Evaluation for Off-Policy Inference
Bootstrapping Fitted Q-Evaluation for Off-Policy Inference
Botao Hao
X. Ji
Yaqi Duan
Hao Lu
Csaba Szepesvári
Mengdi Wang
OffRL
75
40
0
06 Feb 2021
Near-Optimal Offline Reinforcement Learning via Double Variance
  Reduction
Near-Optimal Offline Reinforcement Learning via Double Variance Reduction
Ming Yin
Yu Bai
Yu Wang
OffRL
93
69
0
02 Feb 2021
Fast Rates for the Regret of Offline Reinforcement Learning
Fast Rates for the Regret of Offline Reinforcement Learning
Yichun Hu
Nathan Kallus
Masatoshi Uehara
OffRL
112
30
0
31 Jan 2021
High-Confidence Off-Policy (or Counterfactual) Variance Estimation
High-Confidence Off-Policy (or Counterfactual) Variance Estimation
Yash Chandak
Shiv Shankar
Philip S. Thomas
OffRL
31
8
0
25 Jan 2021
Average-Reward Off-Policy Policy Evaluation with Function Approximation
Average-Reward Off-Policy Policy Evaluation with Function Approximation
Shangtong Zhang
Yi Wan
R. Sutton
Shimon Whiteson
OffRL
73
31
0
08 Jan 2021
SDA: Improving Text Generation with Self Data Augmentation
SDA: Improving Text Generation with Self Data Augmentation
Ping Yu
Ruiyi Zhang
Yang Zhao
Yizhe Zhang
Chunyuan Li
Changyou Chen
38
2
0
02 Jan 2021
Is Pessimism Provably Efficient for Offline RL?
Is Pessimism Provably Efficient for Offline RL?
Ying Jin
Zhuoran Yang
Zhaoran Wang
OffRL
193
360
0
30 Dec 2020
The Variational Method of Moments
The Variational Method of Moments
Andrew Bennett
Nathan Kallus
98
30
0
17 Dec 2020
Policy Optimization as Online Learning with Mediator Feedback
Policy Optimization as Online Learning with Mediator Feedback
Alberto Maria Metelli
Matteo Papini
P. DÓro
Marcello Restelli
OffRL
54
10
0
15 Dec 2020
Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can
  be Exponentially Harder than Online RL
Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL
Andrea Zanette
OffRL
206
71
0
14 Dec 2020
Offline Policy Selection under Uncertainty
Offline Policy Selection under Uncertainty
Mengjiao Yang
Bo Dai
Ofir Nachum
George Tucker
Dale Schuurmans
OffRL
57
35
0
12 Dec 2020
Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior
  Policies
Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior Policies
Jinlin Lai
Lixin Zou
Jiaxing Song
OffRL
15
1
0
29 Nov 2020
C-Learning: Learning to Achieve Goals via Recursive Classification
C-Learning: Learning to Achieve Goals via Recursive Classification
Benjamin Eysenbach
Ruslan Salakhutdinov
Sergey Levine
OffRL
78
71
0
17 Nov 2020
Reliable Off-policy Evaluation for Reinforcement Learning
Reliable Off-policy Evaluation for Reinforcement Learning
Jie Wang
Rui Gao
H. Zha
OffRL
79
11
0
08 Nov 2020
Sparse Feature Selection Makes Batch Reinforcement Learning More Sample
  Efficient
Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient
Botao Hao
Yaqi Duan
Tor Lattimore
Csaba Szepesvári
Mengdi Wang
OffRL
142
27
0
08 Nov 2020
Off-Policy Interval Estimation with Lipschitz Value Iteration
Off-Policy Interval Estimation with Lipschitz Value Iteration
Ziyang Tang
Yihao Feng
Na Zhang
Jian Peng
Qiang Liu
OffRL
47
6
0
29 Oct 2020
Batch Reinforcement Learning with a Nonparametric Off-Policy Policy
  Gradient
Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient
Samuele Tosatto
João Carvalho
Jan Peters
OffRL
62
7
0
27 Oct 2020
What are the Statistical Limits of Offline RL with Linear Function
  Approximation?
What are the Statistical Limits of Offline RL with Linear Function Approximation?
Ruosong Wang
Dean Phillips Foster
Sham Kakade
OffRL
175
164
0
22 Oct 2020
CoinDICE: Off-Policy Confidence Interval Estimation
CoinDICE: Off-Policy Confidence Interval Estimation
Bo Dai
Ofir Nachum
Yinlam Chow
Lihong Li
Csaba Szepesvári
Dale Schuurmans
OffRL
76
87
0
22 Oct 2020
Optimal Off-Policy Evaluation from Multiple Logging Policies
Optimal Off-Policy Evaluation from Multiple Logging Policies
Nathan Kallus
Yuta Saito
Masatoshi Uehara
OffRL
71
40
0
21 Oct 2020
Average-reward model-free reinforcement learning: a systematic review
  and literature mapping
Average-reward model-free reinforcement learning: a systematic review and literature mapping
Vektor Dewanto
George Dunn
A. Eshragh
M. Gallagher
Fred Roosta
81
30
0
18 Oct 2020
Instrumental Variable Regression via Kernel Maximum Moment Loss
Instrumental Variable Regression via Kernel Maximum Moment Loss
Rui Zhang
Masaaki Imaizumi
Bernhard Schölkopf
Krikamol Muandet
89
9
0
15 Oct 2020
Offline Learning for Planning: A Summary
Offline Learning for Planning: A Summary
Giorgio Angelotti
Nicolas Drougard
Caroline Ponzoni Carvalho Chanel
OffRL
51
4
0
05 Oct 2020
Variance-Reduced Off-Policy Memory-Efficient Policy Search
Variance-Reduced Off-Policy Memory-Efficient Policy Search
Daoming Lyu
Qi Qi
Mohammad Ghavamzadeh
Hengshuai Yao
Tianbao Yang
Bo Liu
OffRL
71
7
0
14 Sep 2020
Accountable Off-Policy Evaluation With Kernel Bellman Statistics
Accountable Off-Policy Evaluation With Kernel Bellman Statistics
Yihao Feng
Zhaolin Ren
Ziyang Tang
Qiang Liu
OffRL
141
44
0
15 Aug 2020
Batch Value-function Approximation with Only Realizability
Batch Value-function Approximation with Only Realizability
Tengyang Xie
Nan Jiang
OffRL
404
121
0
11 Aug 2020
Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy
Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy
Zuyue Fu
Zhuoran Yang
Zhaoran Wang
87
43
0
02 Aug 2020
Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with
  Latent Confounders
Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders
Andrew Bennett
Nathan Kallus
Lihong Li
Ali Mousavi
OffRL
79
44
0
27 Jul 2020
Statistical Bootstrapping for Uncertainty Estimation in Off-Policy
  Evaluation
Statistical Bootstrapping for Uncertainty Estimation in Off-Policy Evaluation
Ilya Kostrikov
Ofir Nachum
OffRL
52
31
0
27 Jul 2020
Batch Policy Learning in Average Reward Markov Decision Processes
Batch Policy Learning in Average Reward Markov Decision Processes
Peng Liao
Zhengling Qi
Runzhe Wan
P. Klasnja
Susan Murphy
OffRL
131
85
0
23 Jul 2020
Hyperparameter Selection for Offline Reinforcement Learning
Hyperparameter Selection for Offline Reinforcement Learning
T. Paine
Cosmin Paduraru
Andrea Michi
Çağlar Gülçehre
Konrad Zolna
Alexander Novikov
Ziyun Wang
Nando de Freitas
GPOffRL
195
148
0
17 Jul 2020
An Equivalence between Loss Functions and Non-Uniform Sampling in
  Experience Replay
An Equivalence between Loss Functions and Non-Uniform Sampling in Experience Replay
Scott Fujimoto
David Meger
Doina Precup
94
58
0
12 Jul 2020
Near-Optimal Provable Uniform Convergence in Offline Policy Evaluation
  for Reinforcement Learning
Near-Optimal Provable Uniform Convergence in Offline Policy Evaluation for Reinforcement Learning
Ming Yin
Yu Bai
Yu Wang
OffRL
93
31
0
07 Jul 2020
Off-Policy Evaluation via the Regularized Lagrangian
Off-Policy Evaluation via the Regularized Lagrangian
Mengjiao Yang
Ofir Nachum
Bo Dai
Lihong Li
Dale Schuurmans
OffRL
56
118
0
07 Jul 2020
Off-Policy Exploitability-Evaluation in Two-Player Zero-Sum Markov Games
Off-Policy Exploitability-Evaluation in Two-Player Zero-Sum Markov Games
Kenshi Abe
Yusuke Kaneko
OffRL
31
2
0
04 Jul 2020
Learning and Planning in Average-Reward Markov Decision Processes
Learning and Planning in Average-Reward Markov Decision Processes
Yi Wan
A. Naik
R. Sutton
OffRL
76
61
0
29 Jun 2020
Previous
123456
Next