ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.15738
  4. Cited By
Alignment Under Pressure: The Case for Informed Adversaries When Evaluating LLM Defenses

Alignment Under Pressure: The Case for Informed Adversaries When Evaluating LLM Defenses

21 May 2025
Xiaoxue Yang
Bozhidar Stevanoski
Matthieu Meeus
Yves-Alexandre de Montjoye
    AAML
ArXiv (abs)PDFHTML

Papers citing "Alignment Under Pressure: The Case for Informed Adversaries When Evaluating LLM Defenses"

2 / 2 papers shown
Title
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?
Egor Zverev
Sahar Abdelnabi
Soroush Tabesh
Mario Fritz
Christoph H. Lampert
117
27
0
11 Mar 2024
GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models
GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models
Haibo Jin
Ruoxi Chen
Peiyan Zhang
Andy Zhou
Yang Zhang
Haohan Wang
LLMAG
95
28
0
05 Feb 2024
1