ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2207.05852
  4. Cited By
Optimistic PAC Reinforcement Learning: the Instance-Dependent View

Optimistic PAC Reinforcement Learning: the Instance-Dependent View

12 July 2022
Andrea Tirinzoni
Aymen Al Marjani
E. Kaufmann
ArXiv (abs)PDFHTML

Papers citing "Optimistic PAC Reinforcement Learning: the Instance-Dependent View"

9 / 9 papers shown
Title
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
112
2
0
11 Jun 2024
Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds
  for Episodic Reinforcement Learning
Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning
Christoph Dann
T. V. Marinov
M. Mohri
Julian Zimmert
OffRL
50
31
0
02 Jul 2021
Fast active learning for pure exploration in reinforcement learning
Fast active learning for pure exploration in reinforcement learning
Pierre Ménard
O. D. Domingues
Anders Jonsson
E. Kaufmann
Edouard Leurent
Michal Valko
48
97
0
27 Jul 2020
Adaptive Reward-Free Exploration
Adaptive Reward-Free Exploration
E. Kaufmann
Pierre Ménard
O. D. Domingues
Anders Jonsson
Edouard Leurent
Michal Valko
58
82
0
11 Jun 2020
Planning in Markov Decision Processes with Gap-Dependent Sample
  Complexity
Planning in Markov Decision Processes with Gap-Dependent Sample Complexity
Anders Jonsson
E. Kaufmann
Pierre Ménard
O. D. Domingues
Edouard Leurent
Michal Valko
46
35
0
10 Jun 2020
Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs
Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs
Max Simchowitz
Kevin Jamieson
63
145
0
09 May 2019
Tighter Problem-Dependent Regret Bounds in Reinforcement Learning
  without Domain Knowledge using Value Function Bounds
Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds
Andrea Zanette
Emma Brunskill
OffRL
104
276
0
01 Jan 2019
Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement
  Learning
Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning
Christoph Dann
Tor Lattimore
Emma Brunskill
76
309
0
22 Mar 2017
lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits
lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits
Kevin Jamieson
Matthew Malloy
Robert D. Nowak
Sébastien Bubeck
84
415
0
27 Dec 2013
1