ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2201.06169
28
31

On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation

17 January 2022
Xiaohong Chen
Zhengling Qi
    OffRL
ArXivPDFHTML
Abstract

We study the off-policy evaluation (OPE) problem in an infinite-horizon Markov decision process with continuous states and actions. We recast the QQQ-function estimation into a special form of the nonparametric instrumental variables (NPIV) estimation problem. We first show that under one mild condition the NPIV formulation of QQQ-function estimation is well-posed in the sense of L2L^2L2-measure of ill-posedness with respect to the data generating distribution, bypassing a strong assumption on the discount factor γ\gammaγ imposed in the recent literature for obtaining the L2L^2L2 convergence rates of various QQQ-function estimators. Thanks to this new well-posed property, we derive the first minimax lower bounds for the convergence rates of nonparametric estimation of QQQ-function and its derivatives in both sup-norm and L2L^2L2-norm, which are shown to be the same as those for the classical nonparametric regression (Stone, 1982). We then propose a sieve two-stage least squares estimator and establish its rate-optimality in both norms under some mild conditions. Our general results on the well-posedness and the minimax lower bounds are of independent interest to study not only other nonparametric estimators for QQQ-function but also efficient estimation on the value of any target policy in off-policy settings.

View on arXiv
Comments on this paper