StructTransform: A Scalable Attack Surface for Safety-Aligned Large Language Models

Main:15 Pages
7 Figures
Bibliography:4 Pages
6 Tables
Appendix:1 Pages
Abstract
In this work, we present a series of structure transformation attacks on LLM alignment, where we encode natural language intent using diverse syntax spaces, ranging from simple structure formats and basic query languages (e.g. SQL) to new novel spaces and syntaxes created entirely by LLMs. Our extensive evaluation shows that our simplest attacks can achieve close to 90% success rate, even on strict LLMs (such as Claude 3.5 Sonnet) using SOTA alignment mechanisms. We improve the attack performance further by using an adaptive scheme that combines structure transformations along with existing \textit{content transformations}, resulting in over 96% ASR with 0% refusals.
View on arXiv@article{yoosuf2025_2502.11853, title={ StructTransform: A Scalable Attack Surface for Safety-Aligned Large Language Models }, author={ Shehel Yoosuf and Temoor Ali and Ahmed Lekssays and Mashael AlSabah and Issa Khalil }, journal={arXiv preprint arXiv:2502.11853}, year={ 2025 } }
Comments on this paper