Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1808.00232
Cited By
Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy
1 August 2018
Yuan Xie
Boyi Liu
Qiang Liu
Zhaoran Wang
Yuanshuo Zhou
Jian-wei Peng
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy"
2 / 2 papers shown
Title
Benchmarks for Deep Off-Policy Evaluation
Justin Fu
Mohammad Norouzi
Ofir Nachum
George Tucker
Ziyun Wang
...
Yutian Chen
Aviral Kumar
Cosmin Paduraru
Sergey Levine
T. Paine
ELM
OffRL
35
100
0
30 Mar 2021
Reducing Sampling Error in Batch Temporal Difference Learning
Brahma S. Pavse
Ishan Durugkar
Josiah P. Hanna
Peter Stone
OffRL
22
12
0
15 Aug 2020
1