ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.08841
30
26

A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes

15 May 2023
Han Zhong
Tong Zhang
ArXivPDFHTML
Abstract

The proximal policy optimization (PPO) algorithm stands as one of the most prosperous methods in the field of reinforcement learning (RL). Despite its success, the theoretical understanding of PPO remains deficient. Specifically, it is unclear whether PPO or its optimistic variants can effectively solve linear Markov decision processes (MDPs), which are arguably the simplest models in RL with function approximation. To bridge this gap, we propose an optimistic variant of PPO for episodic adversarial linear MDPs with full-information feedback, and establish a O~(d3/4H2K3/4)\tilde{\mathcal{O}}(d^{3/4}H^2K^{3/4})O~(d3/4H2K3/4) regret for it. Here ddd is the ambient dimension of linear MDPs, HHH is the length of each episode, and KKK is the number of episodes. Compared with existing policy-based algorithms, we achieve the state-of-the-art regret bound in both stochastic linear MDPs and adversarial linear MDPs with full information. Additionally, our algorithm design features a novel multi-batched updating mechanism and the theoretical analysis utilizes a new covering number argument of value and policy classes, which might be of independent interest.

View on arXiv
Comments on this paper