ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2109.14429
32
17

Minimal Expected Regret in Linear Quadratic Control

29 September 2021
Yassir Jedra
Alexandre Proutière
    OffRL
ArXivPDFHTML
Abstract

We consider the problem of online learning in Linear Quadratic Control systems whose state transition and state-action transition matrices AAA and BBB may be initially unknown. We devise an online learning algorithm and provide guarantees on its expected regret. This regret at time TTT is upper bounded (i) by O~((du+dx)dxT)\widetilde{O}((d_u+d_x)\sqrt{d_xT})O((du​+dx​)dx​T​) when AAA and BBB are unknown, (ii) by O~(dx2log⁡(T))\widetilde{O}(d_x^2\log(T))O(dx2​log(T)) if only AAA is unknown, and (iii) by O~(dx(du+dx)log⁡(T))\widetilde{O}(d_x(d_u+d_x)\log(T))O(dx​(du​+dx​)log(T)) if only BBB is unknown and under some mild non-degeneracy condition (dxd_xdx​ and dud_udu​ denote the dimensions of the state and of the control input, respectively). These regret scalings are minimal in TTT, dxd_xdx​ and dud_udu​ as they match existing lower bounds in scenario (i) when dx≤dud_x\le d_udx​≤du​ [SF20], and in scenario (ii) [lai1986]. We conjecture that our upper bounds are also optimal in scenario (iii) (there is no known lower bound in this setting). Existing online algorithms proceed in epochs of (typically exponentially) growing durations. The control policy is fixed within each epoch, which considerably simplifies the analysis of the estimation error on AAA and BBB and hence of the regret. Our algorithm departs from this design choice: it is a simple variant of certainty-equivalence regulators, where the estimates of AAA and BBB and the resulting control policy can be updated as frequently as we wish, possibly at every step. Quantifying the impact of such a constantly-varying control policy on the performance of these estimates and on the regret constitutes one of the technical challenges tackled in this paper.

View on arXiv
Comments on this paper