Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.04373
Cited By
Confronting Reward Model Overoptimization with Constrained RLHF
6 October 2023
Ted Moskovitz
Aaditya K. Singh
DJ Strouse
T. Sandholm
Ruslan Salakhutdinov
Anca D. Dragan
Stephen Marcus McAleer
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Confronting Reward Model Overoptimization with Constrained RLHF"
2 / 52 papers shown
Title
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
290
1,831
0
14 Dec 2020
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
301
1,616
0
18 Sep 2019
Previous
1
2