ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.04697
25
89

Variance-reduced QQQ-learning is minimax optimal

11 June 2019
Martin J. Wainwright
    OffRL
ArXivPDFHTML
Abstract

We introduce and analyze a form of variance-reduced QQQ-learning. For γ\gammaγ-discounted MDPs with finite state space X\mathcal{X}X and action space U\mathcal{U}U, we prove that it yields an ϵ\epsilonϵ-accurate estimate of the optimal QQQ-function in the ℓ∞\ell_\inftyℓ∞​-norm using O((Dϵ2(1−γ)3)  log⁡(D(1−γ)))\mathcal{O} \left(\left(\frac{D}{ \epsilon^2 (1-\gamma)^3} \right) \; \log \left( \frac{D}{(1-\gamma)} \right) \right)O((ϵ2(1−γ)3D​)log((1−γ)D​)) samples, where D=∣X∣×∣U∣D = |\mathcal{X}| \times |\mathcal{U}|D=∣X∣×∣U∣. This guarantee matches known minimax lower bounds up to a logarithmic factor in the discount complexity. In contrast, our past work shows that ordinary QQQ-learning has worst-case quartic scaling in the discount complexity.

View on arXiv
Comments on this paper