Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.19552
Cited By
Rethinking harmless refusals when fine-tuning foundation models
27 June 2024
Florin Pop
Judd Rosenblatt
Diogo Schwerz de Lucena
Michael Vaiana
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Rethinking harmless refusals when fine-tuning foundation models"
4 / 4 papers shown
Title
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
...
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
SyDa
MoMe
152
1,583
0
15 Dec 2022
Red Teaming Language Models with Language Models
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
44
627
0
07 Feb 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
582
9,009
0
28 Jan 2022
A General Language Assistant as a Laboratory for Alignment
Amanda Askell
Yuntao Bai
Anna Chen
Dawn Drain
Deep Ganguli
...
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Jared Kaplan
ALM
94
762
0
01 Dec 2021
1