Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.06924
Cited By
Safe Policy Improvement with Baseline Bootstrapping
19 December 2017
Romain Laroche
P. Trichelair
Rémi Tachet des Combes
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Safe Policy Improvement with Baseline Bootstrapping"
50 / 55 papers shown
Title
SNPL: Simultaneous Policy Learning and Evaluation for Safe Multi-Objective Policy Improvement
Brian Cho
Ana-Roxana Pop
Ariel Evince
Nathan Kallus
OffRL
46
0
0
17 Mar 2025
Mitigating Preference Hacking in Policy Optimization with Pessimism
Dhawal Gupta
Adam Fisch
Christoph Dann
Alekh Agarwal
76
0
0
10 Mar 2025
MoMA: Model-based Mirror Ascent for Offline Reinforcement Learning
Mao Hong
Zhiyue Zhang
Yue Wu
Yan Xu
OffRL
50
0
0
21 Jan 2024
Conservative Exploration for Policy Optimization via Off-Policy Policy Evaluation
Paul Daoudi
Mathias Formoso
Othman Gaizi
Achraf Azize
Evrard Garcelon
OffRL
26
0
0
24 Dec 2023
Neural Network Approximation for Pessimistic Offline Reinforcement Learning
Di Wu
Yuling Jiao
Li Shen
Haizhao Yang
Xiliang Lu
OffRL
29
1
0
19 Dec 2023
Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets
Zhang-Wei Hong
Aviral Kumar
Sathwik Karnik
Abhishek Bhandwaldar
Akash Srivastava
Joni Pajarinen
Romain Laroche
Abhishek Gupta
Pulkit Agrawal
OffRL
38
19
0
06 Oct 2023
Stackelberg Batch Policy Learning
Wenzhuo Zhou
Annie Qu
OffRL
35
0
0
28 Sep 2023
MAHALO: Unifying Offline Reinforcement Learning and Imitation Learning from Observations
Anqi Li
Byron Boots
Ching-An Cheng
OffRL
28
16
0
30 Mar 2023
Safe Policy Improvement for POMDPs via Finite-State Controllers
T. D. Simão
Marnix Suilen
N. Jansen
OffRL
32
9
0
12 Jan 2023
Sustainable Online Reinforcement Learning for Auto-bidding
Zhiyu Mou
Yusen Huo
Rongquan Bai
Mingzhou Xie
Chuan Yu
Jian Xu
Bo Zheng
OffRL
OnRL
34
15
0
13 Oct 2022
Incorporating Explicit Uncertainty Estimates into Deep Offline Reinforcement Learning
David Brandfonbrener
Rémi Tachet des Combes
Romain Laroche
OffRL
37
5
0
02 Jun 2022
Non-Markovian policies occupancy measures
Romain Laroche
Rémi Tachet des Combes
Jacob Buckman
OffRL
37
1
0
27 May 2022
User-Interactive Offline Reinforcement Learning
Phillip Swazinna
Steffen Udluft
Thomas Runkler
OffRL
25
11
0
21 May 2022
LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation
Geon-hyeong Kim
Jongmin Lee
Youngsoo Jang
Hongseok Yang
Kyungmin Kim
OffRL
33
15
0
28 Feb 2022
Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms
Romain Laroche
Rémi Tachet des Combes
46
2
0
15 Feb 2022
Adversarially Trained Actor Critic for Offline Reinforcement Learning
Ching-An Cheng
Tengyang Xie
Nan Jiang
Alekh Agarwal
OffRL
16
125
0
05 Feb 2022
Quantile Filtered Imitation Learning
David Brandfonbrener
William F. Whitney
Rajesh Ranganath
Joan Bruna
33
6
0
02 Dec 2021
Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation
Dylan J. Foster
A. Krishnamurthy
D. Simchi-Levi
Yunzong Xu
OffRL
21
62
0
21 Nov 2021
Exploiting Action Impact Regularity and Exogenous State Variables for Offline Reinforcement Learning
Vincent Liu
James Wright
Martha White
OffRL
31
1
0
15 Nov 2021
AW-Opt: Learning Robotic Skills with Imitation and Reinforcement at Scale
Yao Lu
Karol Hausman
Yevgen Chebotar
Mengyuan Yan
Eric Jang
...
Ted Xiao
A. Irpan
Mohi Khansari
Dmitry Kalashnikov
Sergey Levine
OffRL
95
59
0
09 Nov 2021
Safe Data Collection for Offline and Online Policy Learning
Ruihao Zhu
B. Kveton
OffRL
11
5
0
08 Nov 2021
Offline Reinforcement Learning with Soft Behavior Regularization
Haoran Xu
Xianyuan Zhan
Jianxiong Li
Honglei Yin
OffRL
26
31
0
14 Oct 2021
Medical Dead-ends and Learning to Identify High-risk States and Treatments
Mehdi Fatemi
Taylor W. Killian
J. Subramanian
Marzyeh Ghassemi
OffRL
30
37
0
08 Oct 2021
Dr Jekyll and Mr Hyde: the Strange Case of Off-Policy Policy Updates
Romain Laroche
Rémi Tachet des Combes
46
8
0
29 Sep 2021
Greedy UnMixing for Q-Learning in Multi-Agent Reinforcement Learning
Chapman Siu
Jason M. Traish
R. Xu
33
2
0
19 Sep 2021
Conservative Data Sharing for Multi-Task Offline Reinforcement Learning
Tianhe Yu
Aviral Kumar
Yevgen Chebotar
Karol Hausman
Sergey Levine
Chelsea Finn
OffRL
35
77
0
16 Sep 2021
Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
Andrea Zanette
Martin J. Wainwright
Emma Brunskill
OffRL
29
113
0
19 Aug 2021
Model Selection for Offline Reinforcement Learning: Practical Considerations for Healthcare Settings
Shengpu Tang
Jenna Wiens
OffRL
26
78
0
23 Jul 2021
OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation
Jongmin Lee
Wonseok Jeon
Byung-Jun Lee
J. Pineau
Kee-Eung Kim
OffRL
37
91
0
21 Jun 2021
Offline RL Without Off-Policy Evaluation
David Brandfonbrener
William F. Whitney
Rajesh Ranganath
Joan Bruna
OffRL
42
162
0
16 Jun 2021
Bellman-consistent Pessimism for Offline Reinforcement Learning
Tengyang Xie
Ching-An Cheng
Nan Jiang
Paul Mineiro
Alekh Agarwal
OffRL
LRM
27
270
0
13 Jun 2021
A Minimalist Approach to Offline Reinforcement Learning
Scott Fujimoto
S. Gu
OffRL
58
785
0
12 Jun 2021
Offline Reinforcement Learning as Anti-Exploration
Shideh Rezaeifar
Robert Dadashi
Nino Vieillard
Léonard Hussenot
Olivier Bachem
Olivier Pietquin
M. Geist
OffRL
51
51
0
11 Jun 2021
Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning
Yue Wu
Shuangfei Zhai
Nitish Srivastava
J. Susskind
Jian Zhang
Ruslan Salakhutdinov
Hanlin Goh
EDL
OffRL
OnRL
21
184
0
17 May 2021
Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills
Yevgen Chebotar
Karol Hausman
Yao Lu
Ted Xiao
Dmitry Kalashnikov
...
A. Irpan
Benjamin Eysenbach
Ryan Julian
Chelsea Finn
Sergey Levine
SSL
OffRL
32
146
0
15 Apr 2021
Instabilities of Offline RL with Pre-Trained Neural Representation
Ruosong Wang
Yifan Wu
Ruslan Salakhutdinov
Sham Kakade
OffRL
20
42
0
08 Mar 2021
Offline Reinforcement Learning with Pseudometric Learning
Robert Dadashi
Shideh Rezaeifar
Nino Vieillard
Léonard Hussenot
Olivier Pietquin
M. Geist
OffRL
39
40
0
02 Mar 2021
COMBO: Conservative Offline Model-Based Policy Optimization
Tianhe Yu
Aviral Kumar
Rafael Rafailov
Aravind Rajeswaran
Sergey Levine
Chelsea Finn
OffRL
222
419
0
16 Feb 2021
PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators
Anish Agarwal
Abdullah Alomar
Varkey Alumootil
Devavrat Shah
Dennis Shen
Zhi Xu
Cindy Yang
OffRL
18
18
0
13 Feb 2021
Is Pessimism Provably Efficient for Offline RL?
Ying Jin
Zhuoran Yang
Zhaoran Wang
OffRL
27
349
0
30 Dec 2020
POPO: Pessimistic Offline Policy Optimization
Qiang He
Xinwen Hou
OffRL
35
10
0
26 Dec 2020
Uncertainty-Aware Policy Optimization: A Robust, Adaptive Trust Region Approach
James Queeney
I. Paschalidis
Christos G. Cassandras
31
9
0
19 Dec 2020
Towards Safe Policy Improvement for Non-Stationary MDPs
Yash Chandak
Scott M. Jordan
Georgios Theocharous
Martha White
Philip S. Thomas
OffRL
71
33
0
23 Oct 2020
CoinDICE: Off-Policy Confidence Interval Estimation
Bo Dai
Ofir Nachum
Yinlam Chow
Lihong Li
Csaba Szepesvári
Dale Schuurmans
OffRL
27
84
0
22 Oct 2020
Safety Verification of Model Based Reinforcement Learning Controllers
Akshita Gupta
Inseok Hwang
37
5
0
21 Oct 2020
The Importance of Pessimism in Fixed-Dataset Policy Optimization
Jacob Buckman
Carles Gelada
Marc G. Bellemare
OffRL
42
135
0
15 Sep 2020
Bayesian Robust Optimization for Imitation Learning
Daniel S. Brown
S. Niekum
Marek Petrik
27
32
0
24 Jul 2020
Provably Good Batch Reinforcement Learning Without Great Exploration
Yao Liu
Adith Swaminathan
Alekh Agarwal
Emma Brunskill
OffRL
27
105
0
16 Jul 2020
Off-policy Bandits with Deficient Support
Noveen Sachdeva
Yi-Hsun Su
Thorsten Joachims
OffRL
32
75
0
16 Jun 2020
D4RL: Datasets for Deep Data-Driven Reinforcement Learning
Justin Fu
Aviral Kumar
Ofir Nachum
George Tucker
Sergey Levine
GP
OffRL
75
1,315
0
15 Apr 2020
1
2
Next