Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.13373
Cited By
Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions
24 October 2022
Haanvid Lee
Jongmin Lee
Yunseon Choi
Wonseok Jeon
Byung-Jun Lee
Yung-Kyun Noh
Kee-Eung Kim
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions"
14 / 14 papers shown
Title
Deep Jump Learning for Off-Policy Evaluation in Continuous Treatment Settings
Hengrui Cai
C. Shi
R. Song
Wenbin Lu
OffRL
32
13
0
29 Oct 2020
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation
Yuta Saito
Shunsuke Aihara
Megumi Matsutani
Yusuke Narita
OffRL
128
75
0
17 Aug 2020
Adaptive Estimator Selection for Off-Policy Evaluation
Yi-Hsun Su
Pavithra Srinath
A. Krishnamurthy
OffRL
31
45
0
18 Feb 2020
Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning
Cameron Voloshin
Hoang Minh Le
Nan Jiang
Yisong Yue
OffRL
47
154
0
15 Nov 2019
Policy Evaluation and Optimization with Continuous Treatments
Nathan Kallus
Angela Zhou
OffRL
92
132
0
16 Feb 2018
More Robust Doubly Robust Off-policy Evaluation
Mehrdad Farajtabar
Yinlam Chow
Mohammad Ghavamzadeh
OffRL
58
267
0
10 Feb 2018
Reinforcement Learning with Deep Energy-Based Policies
Tuomas Haarnoja
Haoran Tang
Pieter Abbeel
Sergey Levine
79
1,329
0
27 Feb 2017
Dynamic Pricing with Demand Covariates
Sheng Qiang
Mohsen Bayati
86
116
0
25 Apr 2016
Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
Philip S. Thomas
Emma Brunskill
OffRL
284
573
0
04 Apr 2016
Doubly Robust Off-policy Value Evaluation for Reinforcement Learning
Nan Jiang
Lihong Li
OffRL
162
621
0
11 Nov 2015
Doubly Robust Policy Evaluation and Optimization
Miroslav Dudík
D. Erhan
John Langford
Lihong Li
OffRL
150
285
0
10 Mar 2015
A Survey on Metric Learning for Feature Vectors and Structured Data
A. Bellet
Amaury Habrard
M. Sebban
105
680
0
28 Jun 2013
Doubly Robust Policy Evaluation and Learning
Miroslav Dudík
John Langford
Lihong Li
OffRL
200
694
0
23 Mar 2011
Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms
Lihong Li
Wei Chu
John Langford
Xuanhui Wang
OffRL
172
574
0
31 Mar 2010
1