ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.03652
  4. Cited By
On The Fragility of Learned Reward Functions

On The Fragility of Learned Reward Functions

9 January 2023
Lev McKinney
Yawen Duan
David M. Krueger
Adam Gleave
ArXiv (abs)PDFHTML

Papers citing "On The Fragility of Learned Reward Functions"

5 / 5 papers shown
Title
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
Wei Shen
Guanlin Liu
Zheng Wu
Ruofei Zhu
Qingping Yang
Chao Xin
Yu Yue
Lin Yan
151
14
0
28 Mar 2025
HAF-RM: A Hybrid Alignment Framework for Reward Model Training
HAF-RM: A Hybrid Alignment Framework for Reward Model Training
Shujun Liu
Xiaoyu Shen
Yuhang Lai
Siyuan Wang
Shengbin Yue
Zengfeng Huang
Xuanjing Huang
Zhongyu Wei
124
1
0
04 Jul 2024
Adversarial DPO: Harnessing Harmful Data for Reducing Toxicity with
  Minimal Impact on Coherence and Evasiveness in Dialogue Agents
Adversarial DPO: Harnessing Harmful Data for Reducing Toxicity with Minimal Impact on Coherence and Evasiveness in Dialogue Agents
San Kim
Gary Geunbae Lee
AAML
124
3
0
21 May 2024
Learning to Watermark LLM-generated Text via Reinforcement Learning
Learning to Watermark LLM-generated Text via Reinforcement Learning
Xiaojun Xu
Yuanshun Yao
Yang Liu
94
14
0
13 Mar 2024
Compositional preference models for aligning LMs
Compositional preference models for aligning LMs
Dongyoung Go
Tomasz Korbak
Germán Kruszewski
Jos Rozen
Marc Dymetman
90
20
0
17 Oct 2023
1