Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.07459
Cited By
The Capacity for Moral Self-Correction in Large Language Models
15 February 2023
Deep Ganguli
Amanda Askell
Nicholas Schiefer
Thomas I. Liao
Kamil.e Lukovsiut.e
Anna Chen
Anna Goldie
Azalia Mirhoseini
Catherine Olsson
Danny Hernandez
Dawn Drain
Dustin Li
Eli Tran-Johnson
Ethan Perez
John Kernion
Jamie Kerr
J. Mueller
J. Landau
Kamal Ndousse
Karina Nguyen
Liane Lovitt
Michael Sellitto
Nelson Elhage
Noemí Mercado
Nova Dassarma
Oliver Rausch
R. Lasenby
Robin Larson
Sam Ringer
Sandipan Kundu
Saurav Kadavath
Scott Johnston
Shauna Kravec
S. E. Showk
Tamera Lanham
Timothy Telleen-Lawton
T. Henighan
Tristan Hume
Yuntao Bai
Zac Hatfield-Dodds
Benjamin Mann
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
C. Olah
Jack Clark
Sam Bowman
Jared Kaplan
LRM
ReLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Capacity for Moral Self-Correction in Large Language Models"
15 / 115 papers shown
Title
Comparing Machines and Children: Using Developmental Psychology Experiments to Assess the Strengths and Weaknesses of LaMDA Responses
Eliza Kosoy
Emily Rose Reagan
Leslie Y. Lai
Alison Gopnik
Danielle Krettek Cobb
24
9
0
18 May 2023
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
Miles Turpin
Julian Michael
Ethan Perez
Sam Bowman
ReLM
LRM
27
378
0
07 May 2023
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Zhiqing Sun
Yikang Shen
Qinhong Zhou
Hongxin Zhang
Zhenfang Chen
David D. Cox
Yiming Yang
Chuang Gan
SyDa
ALM
25
313
0
04 May 2023
Teaching Large Language Models to Self-Debug
Xinyun Chen
Maxwell Lin
Nathanael Scharli
Denny Zhou
LRM
36
639
0
11 Apr 2023
Eight Things to Know about Large Language Models
Sam Bowman
ALM
25
113
0
02 Apr 2023
Language Models can Solve Computer Tasks
Geunwoo Kim
Pierre Baldi
Stephen Marcus McAleer
LLMAG
LM&Ro
40
338
0
30 Mar 2023
Perspectives on the Social Impacts of Reinforcement Learning with Human Feedback
Gabrielle K. Liu
OffRL
26
21
0
06 Mar 2023
Large language models predict human sensory judgments across six modalities
Raja Marjieh
Ilia Sucholutsky
Pol van Rijn
Nori Jacoby
Thomas L. Griffiths
VLM
22
31
0
02 Feb 2023
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
227
500
0
28 Sep 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
225
443
0
23 Aug 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
307
4,077
0
24 May 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
355
8,457
0
28 Jan 2022
BBQ: A Hand-Built Bias Benchmark for Question Answering
Alicia Parrish
Angelica Chen
Nikita Nangia
Vishakh Padmakumar
Jason Phang
Jana Thompson
Phu Mon Htut
Sam Bowman
212
367
0
15 Oct 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
Timo Schick
Sahana Udupa
Hinrich Schütze
259
374
0
28 Feb 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
228
4,460
0
23 Jan 2020
Previous
1
2
3