Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.10342
Cited By
Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization
15 February 2024
Yihan Du
Anna Winnicki
Gal Dalal
Shie Mannor
R. Srikant
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization"
2 / 2 papers shown
Title
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Jiawei Huang
Bingcong Li
Christoph Dann
Niao He
OffRL
228
3
0
26 Feb 2025
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
103
2
0
11 Jun 2024
1