ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.14932
28
5

Eluder-based Regret for Stochastic Contextual MDPs

27 November 2022
Orin Levy
Asaf B. Cassel
Alon Cohen
Yishay Mansour
ArXivPDFHTML
Abstract

We present the E-UC3^33RL algorithm for regret minimization in Stochastic Contextual Markov Decision Processes (CMDPs). The algorithm operates under the minimal assumptions of realizable function class and access to \emph{offline} least squares and log loss regression oracles. Our algorithm is efficient (assuming efficient offline regression oracles) and enjoys a regret guarantee of O~(H3T∣S∣∣A∣dE(P)log⁡(∣F∣∣P∣/δ))), \widetilde{O}(H^3 \sqrt{T |S| |A|d_{\mathrm{E}}(\mathcal{P}) \log (|\mathcal{F}| |\mathcal{P}|/ \delta) )}) , O(H3T∣S∣∣A∣dE​(P)log(∣F∣∣P∣/δ))​), with TTT being the number of episodes, SSS the state space, AAA the action space, HHH the horizon, P\mathcal{P}P and F\mathcal{F}F are finite function classes used to approximate the context-dependent dynamics and rewards, respectively, and dE(P)d_{\mathrm{E}}(\mathcal{P})dE​(P) is the Eluder dimension of P\mathcal{P}P w.r.t the Hellinger distance. To the best of our knowledge, our algorithm is the first efficient and rate-optimal regret minimization algorithm for CMDPs that operates under the general offline function approximation setting. In addition, we extend the Eluder dimension to general bounded metrics which may be of separate interest.

View on arXiv
Comments on this paper