ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.10454
30
2

Deflated Dynamics Value Iteration

15 July 2024
Jongmin Lee
Amin Rakhsha
Ernest K. Ryu
Amir-massoud Farahmand
ArXivPDFHTML
Abstract

The Value Iteration (VI) algorithm is an iterative procedure to compute the value function of a Markov decision process, and is the basis of many reinforcement learning (RL) algorithms as well. As the error convergence rate of VI as a function of iteration kkk is O(γk)O(\gamma^k)O(γk), it is slow when the discount factor γ\gammaγ is close to 111. To accelerate the computation of the value function, we propose Deflated Dynamics Value Iteration (DDVI). DDVI uses matrix splitting and matrix deflation techniques to effectively remove (deflate) the top sss dominant eigen-structure of the transition matrix Pπ\mathcal{P}^{\pi}Pπ. We prove that this leads to a O~(γk∣λs+1∣k)\tilde{O}(\gamma^k |\lambda_{s+1}|^k)O~(γk∣λs+1​∣k) convergence rate, where λs+1\lambda_{s+1}λs+1​is (s+1)(s+1)(s+1)-th largest eigenvalue of the dynamics matrix. We then extend DDVI to the RL setting and present Deflated Dynamics Temporal Difference (DDTD) algorithm. We empirically show the effectiveness of the proposed algorithms.

View on arXiv
Comments on this paper