ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2003.02189
  4. Cited By
Exploration-Exploitation in Constrained MDPs

Exploration-Exploitation in Constrained MDPs

4 March 2020
Yonathan Efroni
Shie Mannor
Matteo Pirotta
ArXivPDFHTML

Papers citing "Exploration-Exploitation in Constrained MDPs"

34 / 34 papers shown
Title
Guided by Guardrails: Control Barrier Functions as Safety Instructors for Robotic Learning
Guided by Guardrails: Control Barrier Functions as Safety Instructors for Robotic Learning
Maeva Guerrier
Karthik Soma
Hassan Fouad
Giovanni Beltrame
21
0
0
24 May 2025
Ensuring Safety in an Uncertain Environment: Constrained MDPs via Stochastic Thresholds
Ensuring Safety in an Uncertain Environment: Constrained MDPs via Stochastic Thresholds
Qian Zuo
Fengxiang He
45
0
0
07 Apr 2025
Embedding Safety into RL: A New Take on Trust Region Methods
Embedding Safety into RL: A New Take on Trust Region Methods
Nikola Milosevic
Johannes Müller
Nico Scherf
68
2
0
05 Nov 2024
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Toshinori Kitamura
Tadashi Kozuno
Wataru Kumagai
Kenta Hoshino
Y. Hosoe
Kazumi Kasaura
Masashi Hamaya
Paavo Parmas
Yutaka Matsuo
84
2
0
29 Aug 2024
Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning
Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning
Subhojyoti Mukherjee
Josiah P. Hanna
Qiaomin Xie
Robert Nowak
139
2
0
07 Jun 2024
Learning Adversarial MDPs with Stochastic Hard Constraints
Learning Adversarial MDPs with Stochastic Hard Constraints
Francesco Emanuele Stradi
Matteo Castiglioni
A. Marchesi
Nicola Gatti
97
6
0
06 Mar 2024
Optimistic Policy Optimization with Bandit Feedback
Optimistic Policy Optimization with Bandit Feedback
Yonathan Efroni
Lior Shani
Aviv A. Rosenberg
Shie Mannor
38
90
0
19 Feb 2020
Improved Algorithms for Conservative Exploration in Bandits
Improved Algorithms for Conservative Exploration in Bandits
Evrard Garcelon
Mohammad Ghavamzadeh
A. Lazaric
Matteo Pirotta
36
24
0
08 Feb 2020
Conservative Exploration in Reinforcement Learning
Conservative Exploration in Reinforcement Learning
Evrard Garcelon
Mohammad Ghavamzadeh
A. Lazaric
Matteo Pirotta
37
28
0
08 Feb 2020
Constrained Upper Confidence Reinforcement Learning
Constrained Upper Confidence Reinforcement Learning
Liyuan Zheng
Lillian J. Ratliff
48
68
0
26 Jan 2020
Provably Efficient Exploration in Policy Optimization
Provably Efficient Exploration in Policy Optimization
Qi Cai
Zhuoran Yang
Chi Jin
Zhaoran Wang
37
278
0
12 Dec 2019
Constrained Reinforcement Learning Has Zero Duality Gap
Constrained Reinforcement Learning Has Zero Duality Gap
Santiago Paternain
Luiz F. O. Chamon
Miguel Calvo-Fullana
Alejandro Ribeiro
26
191
0
29 Oct 2019
Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy
  Policies
Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies
Yonathan Efroni
Nadav Merlis
Mohammad Ghavamzadeh
Shie Mannor
OffRL
78
68
0
27 May 2019
Online Convex Optimization in Adversarial Markov Decision Processes
Online Convex Optimization in Adversarial Markov Decision Processes
Aviv A. Rosenberg
Yishay Mansour
40
137
0
19 May 2019
End-to-End Safe Reinforcement Learning through Barrier Functions for
  Safety-Critical Continuous Control Tasks
End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks
Richard Cheng
G. Orosz
R. Murray
J. W. Burdick
49
613
0
21 Mar 2019
Lyapunov-based Safe Policy Optimization for Continuous Control
Lyapunov-based Safe Policy Optimization for Continuous Control
Yinlam Chow
Ofir Nachum
Aleksandra Faust
Edgar A. Duénez-Guzmán
Mohammad Ghavamzadeh
44
245
0
28 Jan 2019
Tighter Problem-Dependent Regret Bounds in Reinforcement Learning
  without Domain Knowledge using Value Function Bounds
Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds
Andrea Zanette
Emma Brunskill
OffRL
93
273
0
01 Jan 2019
Reward Constrained Policy Optimization
Reward Constrained Policy Optimization
Chen Tessler
D. Mankowitz
Shie Mannor
61
540
0
28 May 2018
A Lyapunov-based Approach to Safe Reinforcement Learning
A Lyapunov-based Approach to Safe Reinforcement Learning
Yinlam Chow
Ofir Nachum
Edgar A. Duénez-Guzmán
Mohammad Ghavamzadeh
138
504
0
20 May 2018
Learning-based Model Predictive Control for Safe Exploration
Learning-based Model Predictive Control for Safe Exploration
Torsten Koller
Felix Berkenkamp
M. Turchetta
Andreas Krause
39
376
0
22 Mar 2018
Online Convex Optimization with Stochastic Constraints
Online Convex Optimization with Stochastic Constraints
Hao Yu
M. Neely
Xiaohan Wei
58
222
0
12 Aug 2017
Constrained Policy Optimization
Constrained Policy Optimization
Joshua Achiam
David Held
Aviv Tamar
Pieter Abbeel
91
1,313
0
30 May 2017
Safe Model-based Reinforcement Learning with Stability Guarantees
Safe Model-based Reinforcement Learning with Stability Guarantees
Felix Berkenkamp
M. Turchetta
Angela P. Schoellig
Andreas Krause
124
845
0
23 May 2017
Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement
  Learning
Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning
Christoph Dann
Tor Lattimore
Emma Brunskill
57
307
0
22 Mar 2017
Minimax Regret Bounds for Reinforcement Learning
Minimax Regret Bounds for Reinforcement Learning
M. G. Azar
Ian Osband
Rémi Munos
65
771
0
16 Mar 2017
Conservative Contextual Linear Bandits
Conservative Contextual Linear Bandits
Abbas Kazerouni
Mohammad Ghavamzadeh
Y. Abbasi
Benjamin Van Roy
77
98
0
19 Nov 2016
Fairness in Learning: Classic and Contextual Bandits
Fairness in Learning: Classic and Contextual Bandits
Matthew Joseph
Michael Kearns
Jamie Morgenstern
Aaron Roth
FaML
39
473
0
23 May 2016
Conservative Bandits
Conservative Bandits
Yifan Wu
R. Shariff
Tor Lattimore
Csaba Szepesvári
102
98
0
13 Feb 2016
Risk-Constrained Reinforcement Learning with Percentile Risk Criteria
Risk-Constrained Reinforcement Learning with Percentile Risk Criteria
Yinlam Chow
Mohammad Ghavamzadeh
Lucas Janson
Marco Pavone
52
510
0
05 Dec 2015
Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning
Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning
Christoph Dann
Emma Brunskill
39
249
0
29 Oct 2015
Bandits with concave rewards and convex knapsacks
Bandits with concave rewards and convex knapsacks
Shipra Agrawal
Nikhil R. Devanur
93
197
0
24 Feb 2014
Bandits with Knapsacks
Bandits with Knapsacks
Ashwinkumar Badanidiyuru
Robert D. Kleinberg
Aleksandrs Slivkins
62
429
0
11 May 2013
Trading Regret for Efficiency: Online Convex Optimization with Long Term
  Constraints
Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints
M. Mahdavi
Rong Jin
Tianbao Yang
117
261
0
25 Nov 2011
Empirical Bernstein Bounds and Sample Variance Penalization
Empirical Bernstein Bounds and Sample Variance Penalization
Andreas Maurer
Massimiliano Pontil
147
540
0
21 Jul 2009
1