ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.18907
  4. Cited By
Stronger Enforcement of Instruction Hierarchy via Augmented Intermediate Representations

Stronger Enforcement of Instruction Hierarchy via Augmented Intermediate Representations

25 May 2025
Sanjay Kariyappa
G. E. Suh
ArXiv (abs)PDFHTML

Papers citing "Stronger Enforcement of Instruction Hierarchy via Augmented Intermediate Representations"

2 / 2 papers shown
Title
Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming
Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming
Mrinank Sharma
Meg Tong
Jesse Mu
Jerry Wei
Jorrit Kruthoff
...
Ruiqi Zhong
Giulio Zhou
Jan Leike
Jared Kaplan
Ethan Perez
209
34
0
31 Jan 2025
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?
Egor Zverev
Sahar Abdelnabi
Soroush Tabesh
Mario Fritz
Christoph H. Lampert
117
27
0
11 Mar 2024
1