ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.02293
  4. Cited By
On the Optimality of Batch Policy Optimization Algorithms

On the Optimality of Batch Policy Optimization Algorithms

6 April 2021
Chenjun Xiao
Yifan Wu
Tor Lattimore
Bo Dai
Jincheng Mei
Lihong Li
Csaba Szepesvári
Dale Schuurmans
    OffRL
ArXivPDFHTML

Papers citing "On the Optimality of Batch Policy Optimization Algorithms"

23 / 23 papers shown
Title
NeuroSep-CP-LCB: A Deep Learning-based Contextual Multi-armed Bandit Algorithm with Uncertainty Quantification for Early Sepsis Prediction
NeuroSep-CP-LCB: A Deep Learning-based Contextual Multi-armed Bandit Algorithm with Uncertainty Quantification for Early Sepsis Prediction
Anni Zhou
Raheem Beyah
Rishikesan Kamaleswaran
46
0
0
20 Mar 2025
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Shicong Cen
Jincheng Mei
Katayoon Goshvadi
Hanjun Dai
Tong Yang
Sherry Yang
Dale Schuurmans
Yuejie Chi
Bo Dai
OffRL
67
24
0
20 Feb 2025
Importance-Weighted Offline Learning Done Right
Importance-Weighted Offline Learning Done Right
Germano Gabbianelli
Gergely Neu
Matteo Papini
OffRL
21
5
0
27 Sep 2023
Fast and Regret Optimal Best Arm Identification: Fundamental Limits and
  Low-Complexity Algorithms
Fast and Regret Optimal Best Arm Identification: Fundamental Limits and Low-Complexity Algorithms
Qining Zhang
Lei Ying
72
4
0
01 Sep 2023
Supervised Pretraining Can Learn In-Context Reinforcement Learning
Supervised Pretraining Can Learn In-Context Reinforcement Learning
Jonathan Lee
Annie Xie
Aldo Pacchiano
Yash Chandak
Chelsea Finn
Ofir Nachum
Emma Brunskill
OffRL
35
76
0
26 Jun 2023
Optimal Best-Arm Identification in Bandits with Access to Offline Data
Optimal Best-Arm Identification in Bandits with Access to Offline Data
Shubhada Agrawal
Sandeep Juneja
Karthikeyan Shanmugam
A. Suggala
24
5
0
15 Jun 2023
Bayesian Regret Minimization in Offline Bandits
Bayesian Regret Minimization in Offline Bandits
Marek Petrik
Guy Tennenholtz
Mohammad Ghavamzadeh
OffRL
36
0
0
02 Jun 2023
Offline Primal-Dual Reinforcement Learning for Linear MDPs
Offline Primal-Dual Reinforcement Learning for Linear MDPs
Germano Gabbianelli
Gergely Neu
Nneka Okolo
Matteo Papini
OffRL
29
7
0
22 May 2023
The In-Sample Softmax for Offline Reinforcement Learning
The In-Sample Softmax for Offline Reinforcement Learning
Chenjun Xiao
Han Wang
Yangchen Pan
Adam White
Martha White
OffRL
29
26
0
28 Feb 2023
VIPeR: Provably Efficient Algorithm for Offline RL with Neural Function
  Approximation
VIPeR: Provably Efficient Algorithm for Offline RL with Neural Function Approximation
Thanh Nguyen-Tang
R. Arora
OffRL
48
5
0
24 Feb 2023
Adversarial Model for Offline Reinforcement Learning
Adversarial Model for Offline Reinforcement Learning
M. Bhardwaj
Tengyang Xie
Byron Boots
Nan Jiang
Ching-An Cheng
AAML
OffRL
48
26
0
21 Feb 2023
Model-based Offline Reinforcement Learning with Local Misspecification
Model-based Offline Reinforcement Learning with Local Misspecification
Kefan Dong
Yannis Flet-Berliac
Allen Nie
Emma Brunskill
OffRL
18
4
0
26 Jan 2023
ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies
  with Offline Data
ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data
Tengyang Xie
M. Bhardwaj
Nan Jiang
Ching-An Cheng
OffRL
28
9
0
08 Nov 2022
Online Learning with Off-Policy Feedback
Online Learning with Off-Policy Feedback
Germano Gabbianelli
Matteo Papini
Gergely Neu
OffRL
30
4
0
18 Jul 2022
Pessimism for Offline Linear Contextual Bandits using $\ell_p$
  Confidence Sets
Pessimism for Offline Linear Contextual Bandits using ℓp\ell_pℓp​ Confidence Sets
Gen Li
Cong Ma
Nathan Srebro
OffRL
36
12
0
21 May 2022
Near-optimal Offline Reinforcement Learning with Linear Representation:
  Leveraging Variance Information with Pessimism
Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism
Ming Yin
Yaqi Duan
Mengdi Wang
Yu Wang
OffRL
34
66
0
11 Mar 2022
Model Selection in Batch Policy Optimization
Model Selection in Batch Policy Optimization
Jonathan Lee
George Tucker
Ofir Nachum
Bo Dai
OffRL
27
12
0
23 Dec 2021
Offline Neural Contextual Bandits: Pessimism, Optimization and
  Generalization
Offline Neural Contextual Bandits: Pessimism, Optimization and Generalization
Thanh Nguyen-Tang
Sunil R. Gupta
A. Nguyen
Svetha Venkatesh
OffRL
34
29
0
27 Nov 2021
Towards Instance-Optimal Offline Reinforcement Learning with Pessimism
Towards Instance-Optimal Offline Reinforcement Learning with Pessimism
Ming Yin
Yu Wang
OffRL
29
82
0
17 Oct 2021
The Curse of Passive Data Collection in Batch Reinforcement Learning
The Curse of Passive Data Collection in Batch Reinforcement Learning
Chenjun Xiao
Ilbin Lee
Bo Dai
Dale Schuurmans
Csaba Szepesvári
OffRL
33
1
0
18 Jun 2021
Optimal Uniform OPE and Model-based Offline Reinforcement Learning in
  Time-Homogeneous, Reward-Free and Task-Agnostic Settings
Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings
Ming Yin
Yu Wang
OffRL
32
19
0
13 May 2021
Towards Theoretical Understandings of Robust Markov Decision Processes:
  Sample Complexity and Asymptotics
Towards Theoretical Understandings of Robust Markov Decision Processes: Sample Complexity and Asymptotics
Wenhao Yang
Liangyu Zhang
Zhihua Zhang
28
33
0
09 May 2021
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on
  Open Problems
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
Sergey Levine
Aviral Kumar
George Tucker
Justin Fu
OffRL
GP
343
1,968
0
04 May 2020
1