Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2412.02159
Cited By
Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach
3 December 2024
T. T. Wang
John Hughes
Henry Sleight
Rylan Schaeffer
Rajashree Agrawal
Fazl Barez
Mrinank Sharma
Jesse Mu
Nir Shavit
Ethan Perez
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach"
3 / 3 papers shown
Title
Attacking Multimodal OS Agents with Malicious Image Patches
Lukas Aichberger
Alasdair Paren
Y. Gal
Philip H. S. Torr
Adel Bibi
AAML
59
2
0
13 Mar 2025
À la recherche du sens perdu: your favourite LLM might have more to say than you can understand
K. O. T. Erziev
34
0
0
28 Feb 2025
FLAME: Flexible LLM-Assisted Moderation Engine
Ivan Bakulin
Ilia Kopanichuk
Iaroslav Bespalov
Nikita Radchenko
V. Shaposhnikov
Dmitry V. Dylov
Ivan Oseledets
96
0
0
13 Feb 2025
1