ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.01175
28
0

NeoRL: Efficient Exploration for Nonepisodic RL

3 June 2024
Bhavya Sukhija
Lenart Treven
Florian Dorfler
Stelian Coros
Andreas Krause
    OffRL
ArXivPDFHTML
Abstract

We study the problem of nonepisodic reinforcement learning (RL) for nonlinear dynamical systems, where the system dynamics are unknown and the RL agent has to learn from a single trajectory, i.e., without resets. We propose Nonepisodic Optimistic RL (NeoRL), an approach based on the principle of optimism in the face of uncertainty. NeoRL uses well-calibrated probabilistic models and plans optimistically w.r.t. the epistemic uncertainty about the unknown dynamics. Under continuity and bounded energy assumptions on the system, we provide a first-of-its-kind regret bound of O(ΓTT)O(\Gamma_T \sqrt{T})O(ΓT​T​) for general nonlinear systems with Gaussian process dynamics. We compare NeoRL to other baselines on several deep RL environments and empirically demonstrate that NeoRL achieves the optimal average cost while incurring the least regret.

View on arXiv
@article{sukhija2025_2406.01175,
  title={ NeoRL: Efficient Exploration for Nonepisodic RL },
  author={ Bhavya Sukhija and Lenart Treven and Florian Dörfler and Stelian Coros and Andreas Krause },
  journal={arXiv preprint arXiv:2406.01175},
  year={ 2025 }
}
Comments on this paper