Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization

15 February 2024

Gal Dalal

R. Srikant

Papers citing "Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization"

2 / 2 papers shown

Title
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective Jiawei Huang Bingcong Li Christoph Dann Niao He OffRL 228 3 0 26 Feb 2025
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis Qining Zhang Honghao Wei Lei Ying OffRL 103 2 0 11 Jun 2024