Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.15738
Cited By
Alignment Under Pressure: The Case for Informed Adversaries When Evaluating LLM Defenses
21 May 2025
Xiaoxue Yang
Bozhidar Stevanoski
Matthieu Meeus
Yves-Alexandre de Montjoye
AAML
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Alignment Under Pressure: The Case for Informed Adversaries When Evaluating LLM Defenses"
2 / 2 papers shown
Title
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?
Egor Zverev
Sahar Abdelnabi
Soroush Tabesh
Mario Fritz
Christoph H. Lampert
117
27
0
11 Mar 2024
GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models
Haibo Jin
Ruoxi Chen
Peiyan Zhang
Andy Zhou
Yang Zhang
Haohan Wang
LLMAG
95
28
0
05 Feb 2024
1