ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.05706
38
0

Convergence of TD(0) under Polynomial Mixing with Nonlinear Function Approximation

8 February 2025
Anupama Sridhar
Alexander Johansen
ArXivPDFHTML
Abstract

Temporal Difference Learning (TD(0)) is fundamental in reinforcement learning, yet its finite-sample behavior under non-i.i.d. data and nonlinear approximation remains unknown. We provide the first high-probability, finite-sample analysis of vanilla TD(0) on polynomially mixing Markov data, assuming only Holder continuity and bounded generalized gradients. This breaks with previous work, which often requires subsampling, projections, or instance-dependent step-sizes. Concretely, for mixing exponent β>1\beta > 1β>1, Holder continuity exponent γ\gammaγ, and step-size decay rate η∈(1/2,1]\eta \in (1/2, 1]η∈(1/2,1], we show that, with high probability, \[ \| \theta_t - \theta^* \| \leq C(\beta, \gamma, \eta)\, t^{-\beta/2} + C'(\gamma, \eta)\, t^{-\eta\gamma} \] after t=O(1/ε2)t = \mathcal{O}(1/\varepsilon^2)t=O(1/ε2) iterations. These bounds match the known i.i.d. rates and hold even when initialization is nonstationary. Central to our proof is a novel discrete-time coupling that bypasses geometric ergodicity, yielding the first such guarantee for nonlinear TD(0) under realistic mixing.

View on arXiv
@article{sridhar2025_2502.05706,
  title={ Convergence of TD(0) under Polynomial Mixing with Nonlinear Function Approximation },
  author={ Anupama Sridhar and Alexander Johansen },
  journal={arXiv preprint arXiv:2502.05706},
  year={ 2025 }
}
Comments on this paper