ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1903.08738
  4. Cited By
Batch Policy Learning under Constraints

Batch Policy Learning under Constraints

20 March 2019
Hoang Minh Le
Cameron Voloshin
Yisong Yue
    OffRL
ArXivPDFHTML

Papers citing "Batch Policy Learning under Constraints"

50 / 90 papers shown
Title
Reinforcement Learning for Individual Optimal Policy from Heterogeneous Data
Reinforcement Learning for Individual Optimal Policy from Heterogeneous Data
Rui Miao
Babak Shahbaba
Annie Qu
OffRL
36
0
0
14 May 2025
Fine-Tuning without Performance Degradation
Fine-Tuning without Performance Degradation
Han Wang
Adam White
Martha White
OnRL
274
0
0
01 May 2025
Reinforcement Learning with Continuous Actions Under Unmeasured Confounding
Reinforcement Learning with Continuous Actions Under Unmeasured Confounding
Yuhan Li
Eugene Han
Yifan Hu
Wenzhuo Zhou
Zhengling Qi
Yifan Cui
Ruoqing Zhu
OffRL
251
0
0
01 May 2025
Statistical Inference in Reinforcement Learning: A Selective Survey
Statistical Inference in Reinforcement Learning: A Selective Survey
Chengchun Shi
OffRL
74
1
0
22 Feb 2025
Model Selection for Off-policy Evaluation: New Algorithms and Experimental Protocol
Model Selection for Off-policy Evaluation: New Algorithms and Experimental Protocol
Pai Liu
Lingfeng Zhao
Shivangi Agarwal
Jinghan Liu
Audrey Huang
Philip Amortila
Nan Jiang
OODD
OffRL
109
0
0
11 Feb 2025
Temporal Logic Specification-Conditioned Decision Transformer for Offline Safe Reinforcement Learning
Temporal Logic Specification-Conditioned Decision Transformer for Offline Safe Reinforcement Learning
Zijian Guo
Weichao Zhou
Wenchao Li
OffRL
105
2
0
28 Jan 2025
Counterfactually Fair Reinforcement Learning via Sequential Data Preprocessing
Counterfactually Fair Reinforcement Learning via Sequential Data Preprocessing
Jitao Wang
C. Shi
John D. Piette
Joshua R. Loftus
Donglin Zeng
Zhenke Wu
OffRL
66
0
0
10 Jan 2025
Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning
Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning
Claire Chen
Shuze Liu
Shangtong Zhang
OffRL
210
1
0
08 Oct 2024
Doubly Optimal Policy Evaluation for Reinforcement Learning
Doubly Optimal Policy Evaluation for Reinforcement Learning
Shuze Liu
Claire Chen
Shangtong Zhang
OffRL
48
2
0
03 Oct 2024
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Toshinori Kitamura
Tadashi Kozuno
Wataru Kumagai
Kenta Hoshino
Y. Hosoe
Kazumi Kasaura
Masashi Hamaya
Paavo Parmas
Yutaka Matsuo
77
1
0
29 Aug 2024
Three Dogmas of Reinforcement Learning
Three Dogmas of Reinforcement Learning
David Abel
Mark K. Ho
Anna Harutyunyan
49
5
0
15 Jul 2024
Short-Long Policy Evaluation with Novel Actions
Short-Long Policy Evaluation with Novel Actions
Hyunji Alex Nam
Yash Chandak
Emma Brunskill
OffRL
29
0
0
04 Jul 2024
To Switch or Not to Switch? Balanced Policy Switching in Offline Reinforcement Learning
To Switch or Not to Switch? Balanced Policy Switching in Offline Reinforcement Learning
Tao Ma
Xuzhi Yang
Zoltan Szabo
OffRL
73
0
0
01 Jul 2024
LTL-Constrained Policy Optimization with Cycle Experience Replay
LTL-Constrained Policy Optimization with Cycle Experience Replay
Ameesh Shah
Cameron Voloshin
Chenxi Yang
Abhinav Verma
Swarat Chaudhuri
S. Seshia
36
1
0
17 Apr 2024
On the Curses of Future and History in Future-dependent Value Functions
  for Off-policy Evaluation
On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation
Yuheng Zhang
Nan Jiang
OffRL
41
4
0
22 Feb 2024
A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with
  Uniform PAC Guarantees
A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with Uniform PAC Guarantees
Toshinori Kitamura
Tadashi Kozuno
Masahiro Kato
Yuki Ichihara
Soichiro Nishimori
Akiyoshi Sannai
Sho Sonoda
Wataru Kumagai
Yutaka Matsuo
52
2
0
31 Jan 2024
HiBid: A Cross-Channel Constrained Bidding System with Budget Allocation
  by Hierarchical Offline Deep Reinforcement Learning
HiBid: A Cross-Channel Constrained Bidding System with Budget Allocation by Hierarchical Offline Deep Reinforcement Learning
Hao Wang
Bo Tang
Chi Harold Liu
Shangqin Mao
Jiahong Zhou
Zipeng Dai
Yaqi Sun
Qianlong Xie
Xingxing Wang
Dong Wang
OffRL
43
3
0
29 Dec 2023
Conservative Exploration for Policy Optimization via Off-Policy Policy
  Evaluation
Conservative Exploration for Policy Optimization via Off-Policy Policy Evaluation
Paul Daoudi
Mathias Formoso
Othman Gaizi
Achraf Azize
Evrard Garcelon
OffRL
31
0
0
24 Dec 2023
End-to-end Offline Reinforcement Learning for Glycemia Control
End-to-end Offline Reinforcement Learning for Glycemia Control
Tristan Beolet
Alice Adenis
E. Huneker
Maxime Louis
OffRL
38
1
0
16 Oct 2023
Provably Efficient Exploration in Constrained Reinforcement
  Learning:Posterior Sampling Is All You Need
Provably Efficient Exploration in Constrained Reinforcement Learning:Posterior Sampling Is All You Need
Danil Provodin
Pratik Gajane
Mykola Pechenizkiy
M. Kaptein
41
0
0
27 Sep 2023
Delphic Offline Reinforcement Learning under Nonidentifiable Hidden
  Confounding
Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding
Alizée Pace
Hugo Yèche
Bernhard Schölkopf
Gunnar Rätsch
Guy Tennenholtz
OffRL
31
6
0
01 Jun 2023
Safe Offline Reinforcement Learning with Real-Time Budget Constraints
Safe Offline Reinforcement Learning with Real-Time Budget Constraints
Qian Lin
Bo Tang
Zifan Wu
Chao Yu
Shangqin Mao
Qianlong Xie
Xingxing Wang
Dong Wang
OffRL
44
11
0
01 Jun 2023
Towards Real-World Applications of Personalized Anesthesia Using Policy
  Constraint Q Learning for Propofol Infusion Control
Towards Real-World Applications of Personalized Anesthesia Using Policy Constraint Q Learning for Propofol Infusion Control
Xiuding Cai
Jiao Chen
Yaoyao Zhu
Beiming Wang
Yu Yao
OffRL
41
5
0
17 Mar 2023
VIPeR: Provably Efficient Algorithm for Offline RL with Neural Function
  Approximation
VIPeR: Provably Efficient Algorithm for Offline RL with Neural Function Approximation
Thanh Nguyen-Tang
R. Arora
OffRL
58
5
0
24 Feb 2023
Why Target Networks Stabilise Temporal Difference Methods
Why Target Networks Stabilise Temporal Difference Methods
Matt Fellows
Matthew Smith
Shimon Whiteson
OOD
AAML
21
7
0
24 Feb 2023
HOPE: Human-Centric Off-Policy Evaluation for E-Learning and Healthcare
HOPE: Human-Centric Off-Policy Evaluation for E-Learning and Healthcare
Ge Gao
Song Ju
Markel Sanz Ausin
Min Chi
OffRL
34
8
0
18 Feb 2023
A Reinforcement Learning Framework for Dynamic Mediation Analysis
A Reinforcement Learning Framework for Dynamic Mediation Analysis
Linjuan Ge
Jitao Wang
C. Shi
Zhanghua Wu
Rui Song
31
5
0
31 Jan 2023
Variational Latent Branching Model for Off-Policy Evaluation
Variational Latent Branching Model for Off-Policy Evaluation
Qitong Gao
Ge Gao
Min Chi
Miroslav Pajic
OffRL
41
6
0
28 Jan 2023
Offline Policy Optimization in RL with Variance Regularizaton
Offline Policy Optimization in RL with Variance Regularizaton
Riashat Islam
Samarth Sinha
Homanga Bharadhwaj
Samin Yeasar Arnob
Zhuoran Yang
Animesh Garg
Zhaoran Wang
Lihong Li
Doina Precup
OffRL
33
0
0
29 Dec 2022
Safe Evaluation For Offline Learning: Are We Ready To Deploy?
Safe Evaluation For Offline Learning: Are We Ready To Deploy?
Hager Radi
Josiah P. Hanna
Peter Stone
Matthew E. Taylor
OffRL
ELM
39
0
0
16 Dec 2022
Behavior Estimation from Multi-Source Data for Offline Reinforcement
  Learning
Behavior Estimation from Multi-Source Data for Offline Reinforcement Learning
Guoxi Zhang
H. Kashima
OffRL
34
2
0
29 Nov 2022
On Instance-Dependent Bounds for Offline Reinforcement Learning with
  Linear Function Approximation
On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation
Thanh Nguyen-Tang
Ming Yin
Sunil R. Gupta
Svetha Venkatesh
R. Arora
OffRL
60
16
0
23 Nov 2022
Beyond the Return: Off-policy Function Estimation under User-specified
  Error-measuring Distributions
Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions
Audrey Huang
Nan Jiang
OffRL
62
9
0
27 Oct 2022
Constrained Update Projection Approach to Safe Policy Optimization
Constrained Update Projection Approach to Safe Policy Optimization
Long Yang
Jiaming Ji
Juntao Dai
Linrui Zhang
Binbin Zhou
Pengfei Li
Yaodong Yang
Gang Pan
41
43
0
15 Sep 2022
Multi-Task Fusion via Reinforcement Learning for Long-Term User
  Satisfaction in Recommender Systems
Multi-Task Fusion via Reinforcement Learning for Long-Term User Satisfaction in Recommender Systems
Qihua Zhang
Junning Liu
Yuzhuo Dai
Yiyan Qi
Yifan Yuan
Kunlun Zheng
Fan Huang
Xianfeng Tan
OffRL
35
50
0
09 Aug 2022
Future-Dependent Value-Based Off-Policy Evaluation in POMDPs
Future-Dependent Value-Based Off-Policy Evaluation in POMDPs
Masatoshi Uehara
Haruka Kiyohara
Andrew Bennett
Victor Chernozhukov
Nan Jiang
Nathan Kallus
C. Shi
Wen Sun
OffRL
34
16
0
26 Jul 2022
A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP
A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP
Fan Chen
Junyu Zhang
Zaiwen Wen
OffRL
41
8
0
13 Jul 2022
A Review of Safe Reinforcement Learning: Methods, Theory and
  Applications
A Review of Safe Reinforcement Learning: Methods, Theory and Applications
Shangding Gu
Longyu Yang
Yali Du
Guang Chen
Florian Walter
Jun Wang
Alois C. Knoll
OffRL
AI4TS
117
243
0
20 May 2022
Model-Free and Model-Based Policy Evaluation when Causality is Uncertain
Model-Free and Model-Based Policy Evaluation when Causality is Uncertain
David Bruns-Smith
CML
ELM
OffRL
24
12
0
02 Apr 2022
Safe Neurosymbolic Learning with Differentiable Symbolic Execution
Safe Neurosymbolic Learning with Differentiable Symbolic Execution
Chenxi Yang
Swarat Chaudhuri
32
9
0
15 Mar 2022
Near-optimal Offline Reinforcement Learning with Linear Representation:
  Leveraging Variance Information with Pessimism
Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism
Ming Yin
Yaqi Duan
Mengdi Wang
Yu Wang
OffRL
39
66
0
11 Mar 2022
Testing Stationarity and Change Point Detection in Reinforcement Learning
Testing Stationarity and Change Point Detection in Reinforcement Learning
Mengbing Li
C. Shi
Zhanghua Wu
Piotr Fryzlewicz
OffRL
47
9
0
03 Mar 2022
Off-Policy Confidence Interval Estimation with Confounded Markov
  Decision Process
Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process
C. Shi
Jin Zhu
Ye Shen
Shuang Luo
Hong Zhu
R. Song
OffRL
38
30
0
22 Feb 2022
Off-Policy Fitted Q-Evaluation with Differentiable Function
  Approximators: Z-Estimation and Inference Theory
Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory
Ruiqi Zhang
Xuezhou Zhang
Chengzhuo Ni
Mengdi Wang
OffRL
40
16
0
10 Feb 2022
Tutorial on amortized optimization
Tutorial on amortized optimization
Brandon Amos
OffRL
78
44
0
01 Feb 2022
Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement
  for Value Error
Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error
Scott Fujimoto
David Meger
Doina Precup
Ofir Nachum
S. Gu
32
32
0
28 Jan 2022
Hyperparameter Selection Methods for Fitted Q-Evaluation with Error
  Guarantee
Hyperparameter Selection Methods for Fitted Q-Evaluation with Error Guarantee
Kohei Miyaguchi
OffRL
43
1
0
07 Jan 2022
Off Environment Evaluation Using Convex Risk Minimization
Off Environment Evaluation Using Convex Risk Minimization
Pulkit Katdare
Shuijing Liu
Katherine Driggs-Campbell
18
2
0
21 Dec 2021
Robust On-Policy Sampling for Data-Efficient Policy Evaluation in
  Reinforcement Learning
Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning
Rujie Zhong
Duohan Zhang
Lukas Schafer
Stefano V. Albrecht
Josiah P. Hanna
OOD
OffRL
15
12
0
29 Nov 2021
Pessimistic Model Selection for Offline Deep Reinforcement Learning
Pessimistic Model Selection for Offline Deep Reinforcement Learning
Chao-Han Huck Yang
Zhengling Qi
Yifan Cui
Pin-Yu Chen
OffRL
46
4
0
29 Nov 2021
12
Next