Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.20413
Cited By
Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters
30 May 2024
Haibo Jin
Andy Zhou
Joe D. Menke
Haohan Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters"
13 / 13 papers shown
Title
Access Controls Will Solve the Dual-Use Dilemma
Evžen Wybitul
AAML
26
0
0
14 May 2025
Evolving Security in LLMs: A Study of Jailbreak Attacks and Defenses
Zhengchun Shang
Wenlan Wei
AAML
45
1
0
02 Apr 2025
JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent Enhanced Explanation Evaluation Framework
Fan Liu
Yue Feng
Zhao Xu
Lixin Su
Xinyu Ma
Dawei Yin
Hao Liu
ELM
32
7
0
11 Oct 2024
You Know What I'm Saying: Jailbreak Attack via Implicit Reference
Tianyu Wu
Lingrui Mei
Ruibin Yuan
Lujun Li
Wei Xue
Yike Guo
48
1
0
04 Oct 2024
Endless Jailbreaks with Bijection Learning
Brian R. Y. Huang
Maximilian Li
Leonard Tang
AAML
81
5
0
02 Oct 2024
Tamper-Resistant Safeguards for Open-Weight LLMs
Rishub Tamirisa
Bhrugu Bharathi
Long Phan
Andy Zhou
Alice Gatti
...
Andy Zou
Dawn Song
Bo Li
Dan Hendrycks
Mantas Mazeika
AAML
MU
53
42
0
01 Aug 2024
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
Youliang Yuan
Wenxiang Jiao
Wenxuan Wang
Jen-tse Huang
Jiahao Xu
Tian Liang
Pinjia He
Zhaopeng Tu
45
19
0
12 Jul 2024
JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models
Haibo Jin
Leyang Hu
Xinuo Li
Peiyan Zhang
Chonghan Chen
Jun Zhuang
Haohan Wang
PILM
36
26
0
26 Jun 2024
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs
Zhao Xu
Fan Liu
Hao Liu
AAML
48
8
0
13 Jun 2024
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Patrick Chao
Edoardo Debenedetti
Alexander Robey
Maksym Andriushchenko
Francesco Croce
...
Nicolas Flammarion
George J. Pappas
F. Tramèr
Hamed Hassani
Eric Wong
ALM
ELM
AAML
57
98
0
28 Mar 2024
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion
Qibing Ren
Chang Gao
Jing Shao
Junchi Yan
Xin Tan
Wai Lam
Lizhuang Ma
ALM
ELM
AAML
47
22
0
12 Mar 2024
When "Competency" in Reasoning Opens the Door to Vulnerability: Jailbreaking LLMs via Novel Complex Ciphers
Divij Handa
Advait Chirmule
Bimal Gajera
Chitta Baral
Chitta Baral
50
18
0
16 Feb 2024
AttackEval: How to Evaluate the Effectiveness of Jailbreak Attacking on Large Language Models
Dong Shu
Mingyu Jin
Suiyuan Zhu
Beichen Wang
Zihao Zhou
Chong Zhang
Yongfeng Zhang
ELM
47
12
0
17 Jan 2024
1