Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space

Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space

Papers citing "Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space"

50 / 86 papers shown
Title
SOS! Soft Prompt Attack Against Open-Source Large Language Models
SOS! Soft Prompt Attack Against Open-Source Large Language Models
Ziqing Yang
Michael Backes
Yang Zhang
Ahmed Salem
58
5
0
03 Jul 2024