ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.05171
  4. Cited By
Overcoming Reward Overoptimization via Adversarial Policy Optimization
  with Lightweight Uncertainty Estimation
v1v2 (latest)

Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation

8 March 2024
Xiaoying Zhang
Jean-François Ton
Wei Shen
Hongning Wang
Yang Liu
ArXiv (abs)PDFHTML

Papers citing "Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation"

3 / 3 papers shown
Title
The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking
The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking
Yuchun Miao
Sen Zhang
Liang Ding
Yuqi Zhang
Lefei Zhang
Dacheng Tao
169
5
0
31 Jan 2025
Inverse-RLignment: Large Language Model Alignment from Demonstrations through Inverse Reinforcement Learning
Inverse-RLignment: Large Language Model Alignment from Demonstrations through Inverse Reinforcement Learning
Hao Sun
M. Schaar
179
18
0
28 Jan 2025
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is
  Implicitly an Adversarial Regularizer
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Zhihan Liu
Miao Lu
Shenao Zhang
Boyi Liu
Hongyi Guo
Yingxiang Yang
Jose H. Blanchet
Zhaoran Wang
147
62
0
26 May 2024
1