ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.12842
11
57

Finite-time Analysis of Approximate Policy Iteration for the Linear Quadratic Regulator

30 May 2019
K. Krauth
Stephen Tu
Benjamin Recht
ArXivPDFHTML
Abstract

We study the sample complexity of approximate policy iteration (PI) for the Linear Quadratic Regulator (LQR), building on a recent line of work using LQR as a testbed to understand the limits of reinforcement learning (RL) algorithms on continuous control tasks. Our analysis quantifies the tension between policy improvement and policy evaluation, and suggests that policy evaluation is the dominant factor in terms of sample complexity. Specifically, we show that to obtain a controller that is within ε\varepsilonε of the optimal LQR controller, each step of policy evaluation requires at most (n+d)3/ε2(n+d)^3/\varepsilon^2(n+d)3/ε2 samples, where nnn is the dimension of the state vector and ddd is the dimension of the input vector. On the other hand, only log⁡(1/ε)\log(1/\varepsilon)log(1/ε) policy improvement steps suffice, resulting in an overall sample complexity of (n+d)3ε−2log⁡(1/ε)(n+d)^3 \varepsilon^{-2} \log(1/\varepsilon)(n+d)3ε−2log(1/ε). We furthermore build on our analysis and construct a simple adaptive procedure based on ε\varepsilonε-greedy exploration which relies on approximate PI as a sub-routine and obtains T2/3T^{2/3}T2/3 regret, improving upon a recent result of Abbasi-Yadkori et al.

View on arXiv
Comments on this paper