Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.20556
Cited By
Learning a Pessimistic Reward Model in RLHF
26 May 2025
Yinglun Xu
Hangoo Kang
Tarun Suresh
Yuxuan Wan
Gagandeep Singh
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning a Pessimistic Reward Model in RLHF"
2 / 2 papers shown
Title
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
Shenao Zhang
Donghan Yu
Hiteshi Sharma
Ziyi Yang
Shuohang Wang
Hany Hassan
Zhaoran Wang
LRM
56
32
0
29 May 2024
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Zhihan Liu
Miao Lu
Shenao Zhang
Boyi Liu
Hongyi Guo
Yingxiang Yang
Jose H. Blanchet
Zhaoran Wang
61
47
0
26 May 2024
1