Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.05311
Cited By
When is Realizability Sufficient for Off-Policy Reinforcement Learning?
10 November 2022
Andrea Zanette
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"When is Realizability Sufficient for Off-Policy Reinforcement Learning?"
15 / 15 papers shown
Title
Quantum Non-Linear Bandit Optimization
Zakaria Shams Siam
Chaowen Guan
Chong Liu
34
0
0
04 Mar 2025
Enhancing PPO with Trajectory-Aware Hybrid Policies
Qisai Liu
Zhanhong Jiang
Hsin-Jung Yang
Mahsa Khosravi
Joshua R. Waite
S. Sarkar
49
0
0
21 Feb 2025
A Model Selection Approach for Corruption Robust Reinforcement Learning
Chen-Yu Wei
Christoph Dann
Julian Zimmert
88
44
0
31 Dec 2024
OMPO: A Unified Framework for RL under Policy and Dynamics Shifts
Yu-Juan Luo
Tianying Ji
Gang Hua
Jianwei Zhang
Huazhe Xu
Xianyuan Zhan
OffRL
64
3
0
29 May 2024
Optimal Design for Human Feedback
Subhojyoti Mukherjee
Anusha Lalitha
Kousha Kalantari
Aniket Deshmukh
Ge Liu
Yifei Ma
B. Kveton
48
0
0
22 Apr 2024
A Natural Extension To Online Algorithms For Hybrid RL With Limited Coverage
Kevin Tan
Ziping Xu
OffRL
OnRL
42
4
0
07 Mar 2024
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Yifei Zhou
Andrea Zanette
Jiayi Pan
Sergey Levine
Aviral Kumar
65
50
0
29 Feb 2024
Regularized Q-Learning with Linear Function Approximation
Jiachen Xi
Alfredo Garcia
P. Momcilovic
38
2
0
26 Jan 2024
Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning
Zhaoyi Zhou
Chuning Zhu
Runlong Zhou
Qiwen Cui
Abhishek Gupta
S. S. Du
OffRL
40
8
0
30 Oct 2023
Sample-Efficiency in Multi-Batch Reinforcement Learning: The Need for Dimension-Dependent Adaptivity
Emmeran Johnson
Ciara Pike-Burke
Patrick Rebeschini
OffRL
27
2
0
02 Oct 2023
Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems
Xiang Ji
Huazheng Wang
Minshuo Chen
Tuo Zhao
Mengdi Wang
OffRL
37
6
0
24 Jul 2023
Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data
Ruiqi Zhang
Andrea Zanette
OffRL
OnRL
40
7
0
10 Jul 2023
A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning
Kihyuk Hong
Yuhang Li
Ambuj Tewari
OffRL
26
7
0
13 Jun 2023
Principled Reinforcement Learning with Human Feedback from Pairwise or
K
K
K
-wise Comparisons
Banghua Zhu
Jiantao Jiao
Michael I. Jordan
OffRL
42
183
0
26 Jan 2023
Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian
Paria Rashidinejad
Hanlin Zhu
Kunhe Yang
Stuart J. Russell
Jiantao Jiao
OffRL
45
26
0
01 Nov 2022
1