v1v2 (latest)

JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models

12 April 2024

Papers citing "JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models"

4 / 4 papers shown

Title
"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models Isha Gupta David Khachaturov Robert D. Mullins AAML AuLLM 115 4 0 02 Feb 2025
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs Zhao Xu Fan Liu Hao Liu AAML 126 16 0 13 Jun 2024
Adversarial Tuning: Defending Against Jailbreak Attacks for LLMs Fan Liu Zhao Xu Hao Liu AAML 130 13 0 07 Jun 2024
A Unified Approach to Interpreting Model Predictions Scott M. Lundberg Su-In Lee FAtt 1.2K 22,295 0 22 May 2017