ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.03185
  4. Cited By
Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking

Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking

5 March 2024
Cassidy Laidlaw
Shivam Singhal
Anca Dragan
    AAML
ArXivPDFHTML

Papers citing "Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking"

5 / 5 papers shown
Title
Learning to Assist Humans without Inferring Rewards
Learning to Assist Humans without Inferring Rewards
Vivek Myers
Evan Ellis
Sergey Levine
Benjamin Eysenbach
Anca Dragan
43
2
0
17 Jan 2025
Improving alignment of dialogue agents via targeted human judgements
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
227
502
0
28 Sep 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
328
11,953
0
04 Mar 2022
Reward (Mis)design for Autonomous Driving
Reward (Mis)design for Autonomous Driving
W. B. Knox
A. Allievi
Holger Banzhaf
Felix Schmitt
Peter Stone
83
113
0
28 Apr 2021
Reinforcement Learning for Optimization of COVID-19 Mitigation policies
Reinforcement Learning for Optimization of COVID-19 Mitigation policies
Varun Kompella
Roberto Capobianco
Stacy Jong
Jonathan Browne
S. Fox
L. Meyers
Peter R. Wurman
Peter Stone
75
47
0
20 Oct 2020
1