Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.03423
Cited By
I'm Afraid I Can't Do That: Predicting Prompt Refusal in Black-Box Generative Language Models
6 June 2023
Max Reuter
William B. Schulze
Re-assign community
ArXiv
PDF
HTML
Papers citing
"I'm Afraid I Can't Do That: Predicting Prompt Refusal in Black-Box Generative Language Models"
4 / 4 papers shown
Title
Cannot or Should Not? Automatic Analysis of Refusal Composition in IFT/RLHF Datasets and Refusal Behavior of Black-Box LLMs
Alexander von Recum
Christoph Schnabl
Gabor Hollbeck
Silas Alberti
Philip Blinde
Marvin von Hagen
92
2
0
22 Dec 2024
DiDOTS: Knowledge Distillation from Large-Language-Models for Dementia Obfuscation in Transcribed Speech
Dominika Woszczyk
Soteris Demetriou
30
0
0
05 Oct 2024
Programming Refusal with Conditional Activation Steering
Bruce W. Lee
Inkit Padhi
Karthikeyan N. Ramamurthy
Erik Miehling
Pierre Dognin
Manish Nagireddy
Amit Dhurandhar
LLMSV
108
15
0
06 Sep 2024
SoK: Memorization in General-Purpose Large Language Models
Valentin Hartmann
Anshuman Suri
Vincent Bindschaedler
David Evans
Shruti Tople
Robert West
KELM
LLMAG
29
20
0
24 Oct 2023
1