Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.05171
Cited By
v1
v2 (latest)
Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation
8 March 2024
Xiaoying Zhang
Jean-François Ton
Wei Shen
Hongning Wang
Yang Liu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation"
3 / 3 papers shown
Title
The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking
Yuchun Miao
Sen Zhang
Liang Ding
Yuqi Zhang
Lefei Zhang
Dacheng Tao
169
5
0
31 Jan 2025
Inverse-RLignment: Large Language Model Alignment from Demonstrations through Inverse Reinforcement Learning
Hao Sun
M. Schaar
179
18
0
28 Jan 2025
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Zhihan Liu
Miao Lu
Shenao Zhang
Boyi Liu
Hongyi Guo
Yingxiang Yang
Jose H. Blanchet
Zhaoran Wang
147
62
0
26 May 2024
1