Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy

9 October 2024

Papers citing "Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy"

4 / 4 papers shown

Title
The Illusion of Role Separation: Hidden Shortcuts in LLM Role Learning (and How to Fix Them) Zihao Wang Yibo Jiang Jiahao Yu Heqing Huang 35 0 0 01 May 2025
WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks Ivan Evtimov Arman Zharmagambetov Aaron Grattafiori Chuan Guo Kamalika Chaudhuri AAML 35 0 0 22 Apr 2025
ASIDE: Architectural Separation of Instructions and Data in Language Models Egor Zverev Evgenii Kortukov Alexander Panfilov Soroush Tabesh Alexandra Volkova Sebastian Lapuschkin Wojciech Samek Christoph H. Lampert AAML 54 1 0 13 Mar 2025
Control Illusion: The Failure of Instruction Hierarchies in Large Language Models Yilin Geng Hao Li Honglin Mu Xudong Han Timothy Baldwin Omri Abend Eduard H. Hovy Lea Frermann 41 2 0 21 Feb 2025