Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.05862
Cited By
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
12 April 2022
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
Nova Dassarma
Dawn Drain
Stanislav Fort
Deep Ganguli
T. Henighan
Nicholas Joseph
Saurav Kadavath
John Kernion
Tom Conerly
S. E. Showk
Nelson Elhage
Zac Hatfield-Dodds
Danny Hernandez
Tristan Hume
Scott R. Johnston
Shauna Kravec
Liane Lovitt
Neel Nanda
Catherine Olsson
Dario Amodei
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
4 / 654 papers shown
Title
X-Risk Analysis for AI Research
Dan Hendrycks
Mantas Mazeika
97
71
0
13 Jun 2022
Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress
Rishabh Agarwal
Max Schwarzer
Pablo Samuel Castro
Rameswar Panda
Marc G. Bellemare
OffRL
OnRL
126
66
0
03 Jun 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
1.2K
13,290
0
04 Mar 2022
Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness
Jeremiah Zhe Liu
Zi Lin
Shreyas Padhy
Dustin Tran
Tania Bedrax-Weiss
Balaji Lakshminarayanan
UQCV
BDL
293
453
0
17 Jun 2020
Previous
1
2
3
...
12
13
14