Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1811.03056
Cited By
Policy Certificates: Towards Accountable Reinforcement Learning
7 November 2018
Christoph Dann
Ashutosh Adhikari
Wei Wei
Jimmy J. Lin
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Policy Certificates: Towards Accountable Reinforcement Learning"
39 / 39 papers shown
Title
Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition
Zhong Zheng
Haochen Zhang
Lingzhou Xue
OffRL
78
2
0
10 Oct 2024
Multiple-policy Evaluation via Density Estimation
Yilei Chen
Aldo Pacchiano
I. Paschalidis
OffRL
29
0
0
29 Mar 2024
A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with Uniform PAC Guarantees
Toshinori Kitamura
Tadashi Kozuno
Masahiro Kato
Yuki Ichihara
Soichiro Nishimori
Akiyoshi Sannai
Sho Sonoda
Wataru Kumagai
Yutaka Matsuo
42
2
0
31 Jan 2024
Settling the Sample Complexity of Online Reinforcement Learning
Zihan Zhang
Yuxin Chen
Jason D. Lee
S. Du
OffRL
98
22
0
25 Jul 2023
Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data
Ruiqi Zhang
Andrea Zanette
OffRL
OnRL
40
7
0
10 Jul 2023
Fast Rates for Maximum Entropy Exploration
D. Tiapkin
Denis Belomestny
Daniele Calandriello
Eric Moulines
Rémi Munos
A. Naumov
Pierre Perrault
Yunhao Tang
Michal Valko
Pierre Menard
44
17
0
14 Mar 2023
Provably Efficient Reinforcement Learning via Surprise Bound
Hanlin Zhu
Ruosong Wang
Jason D. Lee
OffRL
28
5
0
22 Feb 2023
Learning in POMDPs is Sample-Efficient with Hindsight Observability
Jonathan Lee
Alekh Agarwal
Christoph Dann
Tong Zhang
34
19
0
31 Jan 2023
Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments
Runlong Zhou
Zihan Zhang
S. Du
44
10
0
31 Jan 2023
Understanding the Complexity Gains of Single-Task RL with a Curriculum
Qiyang Li
Yuexiang Zhai
Yi Ma
Sergey Levine
37
14
0
24 Dec 2022
Opportunistic Episodic Reinforcement Learning
Xiaoxiao Wang
Nader Bouacida
Xueying Guo
Xin Liu
21
0
0
24 Oct 2022
Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning
Zihan Zhang
Yuhang Jiang
Yuanshuo Zhou
Xiangyang Ji
OffRL
26
9
0
15 Oct 2022
Square-root regret bounds for continuous-time episodic Markov decision processes
Xuefeng Gao
X. Zhou
43
6
0
03 Oct 2022
Horizon-Free Reinforcement Learning in Polynomial Time: the Power of Stationary Policies
Zihan Zhang
Xiangyang Ji
S. Du
30
21
0
24 Mar 2022
Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost
Dan Qiao
Ming Yin
Ming Min
Yu-Xiang Wang
43
28
0
13 Feb 2022
Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning
Yuanzhi Li
Ruosong Wang
Lin F. Yang
25
20
0
01 Nov 2021
Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function Approximation
Zaiwei Chen
S. Khodadadian
S. T. Maguluri
OffRL
63
29
0
26 May 2021
Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings
Ming Yin
Yu-Xiang Wang
OffRL
32
19
0
13 May 2021
Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation
Andrea Zanette
Ching-An Cheng
Alekh Agarwal
32
52
0
24 Mar 2021
Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm
S. Khodadadian
Zaiwei Chen
S. T. Maguluri
CML
OffRL
71
26
0
18 Feb 2021
Task-Optimal Exploration in Linear Dynamical Systems
Andrew Wagenmaker
Max Simchowitz
Kevin G. Jamieson
22
18
0
10 Feb 2021
A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants
Zaiwei Chen
S. T. Maguluri
Sanjay Shakkottai
Karthikeyan Shanmugam
OffRL
105
54
0
02 Feb 2021
Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDP
Zihan Zhang
Jiaqi Yang
Xiangyang Ji
S. Du
71
36
0
29 Jan 2021
Learning to be Safe: Deep RL with a Safety Critic
K. Srinivasan
Benjamin Eysenbach
Sehoon Ha
Jie Tan
Chelsea Finn
OffRL
33
143
0
27 Oct 2020
Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs
Jiafan He
Dongruo Zhou
Quanquan Gu
16
37
0
01 Oct 2020
Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration
Andrea Zanette
A. Lazaric
Mykel J. Kochenderfer
Emma Brunskill
22
64
0
18 Aug 2020
Near-Optimal Reinforcement Learning with Self-Play
Yunru Bai
Chi Jin
Tiancheng Yu
19
129
0
22 Jun 2020
Task-agnostic Exploration in Reinforcement Learning
Xuezhou Zhang
Yuzhe Ma
Adish Singla
OffRL
20
49
0
16 Jun 2020
Q
Q
Q
-learning with Logarithmic Regret
Kunhe Yang
Lin F. Yang
S. Du
43
59
0
16 Jun 2020
Model-Based Reinforcement Learning with Value-Targeted Regression
Alex Ayoub
Zeyu Jia
Csaba Szepesvári
Mengdi Wang
Lin F. Yang
OffRL
19
299
0
01 Jun 2020
Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension
Ruosong Wang
Ruslan Salakhutdinov
Lin F. Yang
23
55
0
21 May 2020
Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation
Yaqi Duan
Mengdi Wang
OffRL
18
149
0
21 Feb 2020
Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions
Omer Gottesman
Joseph D. Futoma
Yao Liu
Soanli Parbhoo
Leo Anthony Celi
Emma Brunskill
Finale Doshi-Velez
OffRL
147
55
0
10 Feb 2020
VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning
L. Zintgraf
K. Shiarlis
Maximilian Igl
Sebastian Schulze
Y. Gal
Katja Hofmann
Shimon Whiteson
OffRL
11
272
0
18 Oct 2019
Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies
Yonathan Efroni
Nadav Merlis
Mohammad Ghavamzadeh
Shie Mannor
OffRL
22
68
0
27 May 2019
Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound
Lin F. Yang
Mengdi Wang
OffRL
GP
13
283
0
24 May 2019
HARK Side of Deep Learning -- From Grad Student Descent to Automated Machine Learning
O. Gencoglu
M. Gils
E. Guldogan
Chamin Morikawa
Mehmet Süzen
M. Gruber
J. Leinonen
H. Huttunen
11
36
0
16 Apr 2019
Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds
Andrea Zanette
Emma Brunskill
OffRL
24
273
0
01 Jan 2019
Agent Embeddings: A Latent Representation for Pole-Balancing Networks
Oscar Chang
Robert Kwiatkowski
Siyuan Chen
Hod Lipson
24
6
0
12 Nov 2018
1