Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2412.06748
Cited By
Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models
9 December 2024
Neel Jain
Aditya Shrivastava
Chenyang Zhu
Daben Liu
Alfy Samuel
Ashwinee Panda
Anoop Kumar
Micah Goldblum
Tom Goldstein
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models"
1 / 1 papers shown
Title
Safety Pretraining: Toward the Next Generation of Safe AI
Pratyush Maini
Sachin Goyal
Dylan Sam
Alex Robey
Yash Savani
Yiding Jiang
Andy Zou
Zacharcy C. Lipton
J. Zico Kolter
179
3
0
23 Apr 2025
1