Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.22037
Cited By
Jailbreak Distillation: Renewable Safety Benchmarking
28 May 2025
Jingyu Zhang
Ahmed Elgohary
Xiawei Wang
A S M Iftekhar
Ahmed Magooda
Benjamin Van Durme
Daniel Khashabi
Kyle Jackson
AAML
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Jailbreak Distillation: Renewable Safety Benchmarking"
11 / 11 papers shown
Title
Measuring General Intelligence with Generated Games
Vivek Verma
David Huang
William Chen
Dan Klein
Nicholas Tomlin
ReLM
ELM
LM&MA
LRM
70
2
0
12 May 2025
Know Thy Judge: On the Robustness Meta-Evaluation of LLM Safety Judges
Francisco Eiras
Eliott Zemour
Eric Lin
Vaikkunth Mugunthan
ELM
89
1
0
06 Mar 2025
Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation
Simin Chen
Yiming Chen
Zexin Li
Yifan Jiang
Zhongwei Wan
...
Dezhi Ran
Tianle Gu
Haoyang Li
Tao Xie
Baishakhi Ray
72
4
0
23 Feb 2025
Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities
Zora Che
Stephen Casper
Robert Kirk
Anirudh Satheesh
Stewart Slocum
...
Zikui Cai
Bilal Chughtai
Y. Gal
Furong Huang
Dylan Hadfield-Menell
MU
AAML
ELM
100
3
0
03 Feb 2025
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Xiaogeng Liu
Peiran Li
Edward Suh
Yevgeniy Vorobeychik
Zhuoqing Mao
Somesh Jha
Patrick McDaniel
Huan Sun
Bo Li
Chaowei Xiao
59
22
0
03 Oct 2024
RED QUEEN: Safeguarding Large Language Models against Concealed Multi-Turn Jailbreaking
Yifan Jiang
Kriti Aggarwal
Tanmay Laud
Kashif Munir
Jay Pujara
Subhabrata Mukherjee
AAML
65
13
0
26 Sep 2024
ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming
Simone Tedeschi
Felix Friedrich
P. Schramowski
Kristian Kersting
Roberto Navigli
Huu Nguyen
Bo Li
ELM
60
49
0
06 Apr 2024
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Maksym Andriushchenko
Francesco Croce
Nicolas Flammarion
AAML
111
186
0
02 Apr 2024
Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack
M. Russinovich
Ahmed Salem
Ronen Eldan
66
86
0
02 Apr 2024
How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs
Yi Zeng
Hongpeng Lin
Jingwen Zhang
Diyi Yang
Ruoxi Jia
Weiyan Shi
39
284
0
12 Jan 2024
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
133
1,376
0
27 Jul 2023
1