Training a Helpful and Harmless Assistant with Reinforcement Learning
from Human Feedback

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

12 April 2022

Deep Ganguli

Nicholas Joseph

Saurav Kadavath

Zac Hatfield-Dodds

Danny Hernandez

Scott R. Johnston

Catherine Olsson

ArXiv (abs)PDF HTML

Papers citing "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"

0 / 659 papers shown

Title
No papers