ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.23434
31
0

Model-free Low-Rank Reinforcement Learning via Leveraged Entry-wise Matrix Estimation

30 October 2024
Stefan Stojanovic
Yassir Jedra
Alexandre Proutiere
ArXivPDFHTML
Abstract

We consider the problem of learning an ε\varepsilonε-optimal policy in controlled dynamical systems with low-rank latent structure. For this problem, we present LoRa-PI (Low-Rank Policy Iteration), a model-free learning algorithm alternating between policy improvement and policy evaluation steps. In the latter, the algorithm estimates the low-rank matrix corresponding to the (state, action) value function of the current policy using the following two-phase procedure. The entries of the matrix are first sampled uniformly at random to estimate, via a spectral method, the leverage scores of its rows and columns. These scores are then used to extract a few important rows and columns whose entries are further sampled. The algorithm exploits these new samples to complete the matrix estimation using a CUR-like method. For this leveraged matrix estimation procedure, we establish entry-wise guarantees that remarkably, do not depend on the coherence of the matrix but only on its spikiness. These guarantees imply that LoRa-PI learns an ε\varepsilonε-optimal policy using O~(S+Apoly(1−γ)ε2)\widetilde{O}({S+A\over \mathrm{poly}(1-\gamma)\varepsilon^2})O(poly(1−γ)ε2S+A​) samples where SSS (resp. AAA) denotes the number of states (resp. actions) and γ\gammaγ the discount factor. Our algorithm achieves this order-optimal (in SSS, AAA and ε\varepsilonε) sample complexity under milder conditions than those assumed in previously proposed approaches.

View on arXiv
Comments on this paper