ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.05311
  4. Cited By
When is Realizability Sufficient for Off-Policy Reinforcement Learning?

When is Realizability Sufficient for Off-Policy Reinforcement Learning?

10 November 2022
Andrea Zanette
    OffRL
ArXivPDFHTML

Papers citing "When is Realizability Sufficient for Off-Policy Reinforcement Learning?"

15 / 15 papers shown
Title
Quantum Non-Linear Bandit Optimization
Zakaria Shams Siam
Chaowen Guan
Chong Liu
34
0
0
04 Mar 2025
Enhancing PPO with Trajectory-Aware Hybrid Policies
Qisai Liu
Zhanhong Jiang
Hsin-Jung Yang
Mahsa Khosravi
Joshua R. Waite
S. Sarkar
49
0
0
21 Feb 2025
A Model Selection Approach for Corruption Robust Reinforcement Learning
A Model Selection Approach for Corruption Robust Reinforcement Learning
Chen-Yu Wei
Christoph Dann
Julian Zimmert
88
44
0
31 Dec 2024
OMPO: A Unified Framework for RL under Policy and Dynamics Shifts
OMPO: A Unified Framework for RL under Policy and Dynamics Shifts
Yu-Juan Luo
Tianying Ji
Gang Hua
Jianwei Zhang
Huazhe Xu
Xianyuan Zhan
OffRL
64
3
0
29 May 2024
Optimal Design for Human Feedback
Optimal Design for Human Feedback
Subhojyoti Mukherjee
Anusha Lalitha
Kousha Kalantari
Aniket Deshmukh
Ge Liu
Yifei Ma
B. Kveton
44
0
0
22 Apr 2024
A Natural Extension To Online Algorithms For Hybrid RL With Limited
  Coverage
A Natural Extension To Online Algorithms For Hybrid RL With Limited Coverage
Kevin Tan
Ziping Xu
OffRL
OnRL
42
4
0
07 Mar 2024
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Yifei Zhou
Andrea Zanette
Jiayi Pan
Sergey Levine
Aviral Kumar
65
50
0
29 Feb 2024
Regularized Q-Learning with Linear Function Approximation
Regularized Q-Learning with Linear Function Approximation
Jiachen Xi
Alfredo Garcia
P. Momcilovic
38
2
0
26 Jan 2024
Free from Bellman Completeness: Trajectory Stitching via Model-based
  Return-conditioned Supervised Learning
Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning
Zhaoyi Zhou
Chuning Zhu
Runlong Zhou
Qiwen Cui
Abhishek Gupta
S. S. Du
OffRL
40
8
0
30 Oct 2023
Sample-Efficiency in Multi-Batch Reinforcement Learning: The Need for
  Dimension-Dependent Adaptivity
Sample-Efficiency in Multi-Batch Reinforcement Learning: The Need for Dimension-Dependent Adaptivity
Emmeran Johnson
Ciara Pike-Burke
Patrick Rebeschini
OffRL
27
2
0
02 Oct 2023
Provable Benefits of Policy Learning from Human Preferences in
  Contextual Bandit Problems
Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems
Xiang Ji
Huazheng Wang
Minshuo Chen
Tuo Zhao
Mengdi Wang
OffRL
37
6
0
24 Jul 2023
Policy Finetuning in Reinforcement Learning via Design of Experiments
  using Offline Data
Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data
Ruiqi Zhang
Andrea Zanette
OffRL
OnRL
40
7
0
10 Jul 2023
A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement
  Learning
A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning
Kihyuk Hong
Yuhang Li
Ambuj Tewari
OffRL
26
7
0
13 Jun 2023
Principled Reinforcement Learning with Human Feedback from Pairwise or
  $K$-wise Comparisons
Principled Reinforcement Learning with Human Feedback from Pairwise or KKK-wise Comparisons
Banghua Zhu
Jiantao Jiao
Michael I. Jordan
OffRL
42
181
0
26 Jan 2023
Optimal Conservative Offline RL with General Function Approximation via
  Augmented Lagrangian
Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian
Paria Rashidinejad
Hanlin Zhu
Kunhe Yang
Stuart J. Russell
Jiantao Jiao
OffRL
45
26
0
01 Nov 2022
1