ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.08059
36
17

Fast Rates for Maximum Entropy Exploration

14 March 2023
D. Tiapkin
Denis Belomestny
Daniele Calandriello
Eric Moulines
Rémi Munos
A. Naumov
Pierre Perrault
Yunhao Tang
Michal Valko
Pierre Menard
ArXivPDFHTML
Abstract

We address the challenge of exploration in reinforcement learning (RL) when the agent operates in an unknown environment with sparse or no rewards. In this work, we study the maximum entropy exploration problem of two different types. The first type is visitation entropy maximization previously considered by Hazan et al.(2019) in the discounted setting. For this type of exploration, we propose a game-theoretic algorithm that has O~(H3S2A/ε2)\widetilde{\mathcal{O}}(H^3S^2A/\varepsilon^2)O(H3S2A/ε2) sample complexity thus improving the ε\varepsilonε-dependence upon existing results, where SSS is a number of states, AAA is a number of actions, HHH is an episode length, and ε\varepsilonε is a desired accuracy. The second type of entropy we study is the trajectory entropy. This objective function is closely related to the entropy-regularized MDPs, and we propose a simple algorithm that has a sample complexity of order O~(poly(S,A,H)/ε)\widetilde{\mathcal{O}}(\mathrm{poly}(S,A,H)/\varepsilon)O(poly(S,A,H)/ε). Interestingly, it is the first theoretical result in RL literature that establishes the potential statistical advantage of regularized MDPs for exploration. Finally, we apply developed regularization techniques to reduce sample complexity of visitation entropy maximization to O~(H2SA/ε2)\widetilde{\mathcal{O}}(H^2SA/\varepsilon^2)O(H2SA/ε2), yielding a statistical separation between maximum entropy exploration and reward-free exploration.

View on arXiv
Comments on this paper