ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.12880
15
0

Universal Jailbreak Suffixes Are Strong Attention Hijackers

15 June 2025
Matan Ben-Tov
Mor Geva
Mahmood Sharif
ArXiv (abs)PDFHTML
Main:10 Pages
18 Figures
Bibliography:4 Pages
3 Tables
Appendix:5 Pages
Abstract

We study suffix-based jailbreaks\unicodex2013\unicode{x2013}\unicodex2013a powerful family of attacks against large language models (LLMs) that optimize adversarial suffixes to circumvent safety alignment. Focusing on the widely used foundational GCG attack (Zou et al., 2023), we observe that suffixes vary in efficacy: some markedly more universal\unicodex2013\unicode{x2013}\unicodex2013generalizing to many unseen harmful instructions\unicodex2013\unicode{x2013}\unicodex2013than others. We first show that GCG's effectiveness is driven by a shallow, critical mechanism, built on the information flow from the adversarial suffix to the final chat template tokens before generation. Quantifying the dominance of this mechanism during generation, we find GCG irregularly and aggressively hijacks the contextualization process. Crucially, we tie hijacking to the universality phenomenon, with more universal suffixes being stronger hijackers. Subsequently, we show that these insights have practical implications: GCG universality can be efficiently enhanced (up to ×\times×5 in some cases) at no additional computational cost, and can also be surgically mitigated, at least halving attack success with minimal utility loss. We release our code and data atthis http URL.

View on arXiv
@article{ben-tov2025_2506.12880,
  title={ Universal Jailbreak Suffixes Are Strong Attention Hijackers },
  author={ Matan Ben-Tov and Mor Geva and Mahmood Sharif },
  journal={arXiv preprint arXiv:2506.12880},
  year={ 2025 }
}
Comments on this paper