Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.23558
Cited By
Transferable Ensemble Black-box Jailbreak Attacks on Large Language Models
31 October 2024
Yiqi Yang
Hongye Fu
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transferable Ensemble Black-box Jailbreak Attacks on Large Language Models"
3 / 3 papers shown
Title
How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs
Yi Zeng
Hongpeng Lin
Jingwen Zhang
Diyi Yang
Ruoxi Jia
Weiyan Shi
62
284
0
12 Jan 2024
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
142
330
0
19 Sep 2023
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
163
1,376
0
27 Jul 2023
1