Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.11140
Cited By
Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation
23 May 2022
Xiaoyu Chen
Han Zhong
Zhuoran Yang
Zhaoran Wang
Liwei Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation"
14 / 14 papers shown
Title
Learning Guarantee of Reward Modeling Using Deep Neural Networks
Yuanhang Luo
Yeheng Ge
Ruijian Han
Guohao Shen
34
0
0
10 May 2025
Contextual Online Uncertainty-Aware Preference Learning for Human Feedback
Nan Lu
Ethan X. Fang
Junwei Lu
143
0
0
27 Apr 2025
Reinforcement Learning from Multi-level and Episodic Human Feedback
Muhammad Qasim Elahi
Somtochukwu Oguchienti
Maheed H. Ahmed
Mahsa Ghasemi
OffRL
50
0
0
20 Apr 2025
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Jiawei Huang
Bingcong Li
Christoph Dann
Niao He
OffRL
85
0
0
26 Feb 2025
RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner
Fu-Chieh Chang
Yu-Ting Lee
Hui-Ying Shih
Pei-Yuan Wu
Pei-Yuan Wu
OffRL
LRM
156
0
0
31 Oct 2024
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Qining Zhang
Lei Ying
OffRL
37
2
0
25 Sep 2024
Preference-Guided Reinforcement Learning for Efficient Exploration
Guojian Wang
Faguo Wu
Xiao Zhang
Tianyuan Chen
Xuyang Chen
Lin Zhao
40
0
0
09 Jul 2024
Preference Elicitation for Offline Reinforcement Learning
Alizée Pace
Bernhard Schölkopf
Gunnar Rätsch
Giorgia Ramponi
OffRL
63
1
0
26 Jun 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
64
1
0
11 Jun 2024
Comparisons Are All You Need for Optimizing Smooth Functions
Chenyi Zhang
Tongyang Li
AAML
34
1
0
19 May 2024
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
Ruitao Chen
Liwei Wang
72
1
0
18 May 2024
Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback
Asaf B. Cassel
Haipeng Luo
Aviv A. Rosenberg
Dmitry Sotnikov
OffRL
31
3
0
13 May 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
31
25
0
29 Jan 2024
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism
Zihao Li
Zhuoran Yang
Mengdi Wang
OffRL
29
54
0
29 May 2023
1