ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.06671
57
7

Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks

11 March 2021
Thanh Nguyen-Tang
Sunil R. Gupta
Hung The Tran
Svetha Venkatesh
    OffRL
ArXivPDFHTML
Abstract

Offline reinforcement learning (RL) leverages previously collected data for policy optimization without any further active exploration. Despite the recent interest in this problem, its theoretical results in neural network function approximation settings remain elusive. In this paper, we study the statistical theory of offline RL with deep ReLU network function approximation. In particular, we establish the sample complexity of n=O~(H4+4dακμ1+dαϵ−2−2dα)n = \tilde{\mathcal{O}}( H^{4 + 4 \frac{d}{\alpha}} \kappa_{\mu}^{1 + \frac{d}{\alpha}} \epsilon^{-2 - 2\frac{d}{\alpha}} )n=O~(H4+4αd​κμ1+αd​​ϵ−2−2αd​) for offline RL with deep ReLU networks, where κμ\kappa_{\mu}κμ​ is a measure of distributional shift, {H=(1−γ)−1H = (1-\gamma)^{-1}H=(1−γ)−1 is the effective horizon length}, ddd is the dimension of the state-action space, α\alphaα is a (possibly fractional) smoothness parameter of the underlying Markov decision process (MDP), and ϵ\epsilonϵ is a user-specified error. Notably, our sample complexity holds under two novel considerations: the Besov dynamic closure and the correlated structure. While the Besov dynamic closure subsumes the dynamic conditions for offline RL in the prior works, the correlated structure renders the prior works of offline RL with general/neural network function approximation improper or inefficient {in long (effective) horizon problems}. To the best of our knowledge, this is the first theoretical characterization of the sample complexity of offline RL with deep neural network function approximation under the general Besov regularity condition that goes beyond {the linearity regime} in the traditional Reproducing Hilbert kernel spaces and Neural Tangent Kernels.

View on arXiv
Comments on this paper