ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.01464
22
4

Efficient Rate Optimal Regret for Adversarial Contextual MDPs Using Online Function Approximation

2 March 2023
Orin Levy
Alon Cohen
Asaf B. Cassel
Yishay Mansour
ArXivPDFHTML
Abstract

We present the OMG-CMDP! algorithm for regret minimization in adversarial Contextual MDPs. The algorithm operates under the minimal assumptions of realizable function class and access to online least squares and log loss regression oracles. Our algorithm is efficient (assuming efficient online regression oracles), simple and robust to approximation errors. It enjoys an O~(H2.5T∣S∣∣A∣(R(O)+Hlog⁡(δ−1)))\widetilde{O}(H^{2.5} \sqrt{ T|S||A| ( \mathcal{R}(\mathcal{O}) + H \log(\delta^{-1}) )})O(H2.5T∣S∣∣A∣(R(O)+Hlog(δ−1))​) regret guarantee, with TTT being the number of episodes, SSS the state space, AAA the action space, HHH the horizon and R(O)=R(OsqF)+R(OlogP)\mathcal{R}(\mathcal{O}) = \mathcal{R}(\mathcal{O}_{\mathrm{sq}}^\mathcal{F}) + \mathcal{R}(\mathcal{O}_{\mathrm{log}}^\mathcal{P})R(O)=R(OsqF​)+R(OlogP​) is the sum of the regression oracles' regret, used to approximate the context-dependent rewards and dynamics, respectively. To the best of our knowledge, our algorithm is the first efficient rate optimal regret minimization algorithm for adversarial CMDPs that operates under the minimal standard assumption of online function approximation.

View on arXiv
Comments on this paper