Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.08419
Cited By
v1
v2
v3
v4 (latest)
Jailbreaking Black Box Large Language Models in Twenty Queries
12 October 2023
Patrick Chao
Alexander Robey
Yan Sun
Hamed Hassani
George J. Pappas
Eric Wong
AAML
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Jailbreaking Black Box Large Language Models in Twenty Queries"
46 / 196 papers shown
Title
Few-shot Personalization of LLMs with Mis-aligned Responses
Jaehyung Kim
Yiming Yang
160
9
0
26 Jun 2024
Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference
Anton Xue
Avishree Khare
Rajeev Alur
Surbhi Goel
Eric Wong
161
3
0
21 Jun 2024
"Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jailbreak
Lingrui Mei
Shenghua Liu
Yiwei Wang
Baolong Bi
Jiayi Mao
Xueqi Cheng
AAML
101
11
0
17 Jun 2024
Knowledge-to-Jailbreak: Investigating Knowledge-driven Jailbreaking Attacks for Large Language Models
Shangqing Tu
Zhuoran Pan
Wenxuan Wang
Zhexin Zhang
Yuliang Sun
Jifan Yu
Hongning Wang
Lei Hou
Juanzi Li
ALM
94
0
0
17 Jun 2024
JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
Delong Ran
Jinyuan Liu
Yichen Gong
Jingyi Zheng
Xinlei He
Tianshuo Cong
Anyu Wang
ELM
163
12
0
13 Jun 2024
Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models
Xi Li
Ruofan Mao
Yusen Zhang
Renze Lou
Chen Wu
Jiaqi Wang
LRM
AAML
106
14
0
10 Jun 2024
Machine Against the RAG: Jamming Retrieval-Augmented Generation with Blocker Documents
Avital Shafran
R. Schuster
Vitaly Shmatikov
135
37
0
09 Jun 2024
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
Xunguang Wang
Daoyuan Wu
Zhenlan Ji
Zongjie Li
Pingchuan Ma
Shuai Wang
Yingjiu Li
Yang Liu
Ning Liu
Juergen Rahmel
AAML
181
14
0
08 Jun 2024
Ranking Manipulation for Conversational Search Engines
Samuel Pfrommer
Yatong Bai
Tanmay Gautam
Somayeh Sojoudi
SILM
102
5
0
05 Jun 2024
Defending Large Language Models Against Attacks With Residual Stream Activation Analysis
Amelia Kawasaki
Andrew Davis
Houssam Abbas
AAML
KELM
98
3
0
05 Jun 2024
Exploring Vulnerabilities and Protections in Large Language Models: A Survey
Frank Weizhen Liu
Chenhui Hu
AAML
86
10
0
01 Jun 2024
Mind the Inconspicuous: Revealing the Hidden Weakness in Aligned LLMs' Refusal Boundaries
Jiahao Yu
Haozheng Luo
Jerry Yao-Chieh Hu
Wenbo Guo
Han Liu
Xinyu Xing
109
21
0
31 May 2024
OR-Bench: An Over-Refusal Benchmark for Large Language Models
Justin Cui
Wei-Lin Chiang
Ion Stoica
Cho-Jui Hsieh
ALM
161
55
0
31 May 2024
Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks
Chen Xiong
Xiangyu Qi
Pin-Yu Chen
Tsung-Yi Ho
AAML
124
22
0
30 May 2024
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Seanie Lee
Minsu Kim
Lynn Cherif
David Dobre
Juho Lee
...
Kenji Kawaguchi
Gauthier Gidel
Yoshua Bengio
Nikolay Malkin
Moksh Jain
AAML
158
20
0
28 May 2024
Securing the Future of GenAI: Policy and Technology
Mihai Christodorescu
Craven
Soheil Feizi
Neil Zhenqiang Gong
Mia Hoffmann
...
Jessica Newman
Emelia Probasco
Yanjun Qi
Khawaja Shams
Turek
SILM
99
6
0
21 May 2024
GPT-4 Jailbreaks Itself with Near-Perfect Success Using Self-Explanation
Govind Ramesh
Yao Dou
Wei Xu
PILM
111
17
0
21 May 2024
Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors
Jiachen Sun
Changsheng Wang
Jiong Wang
Yiwei Zhang
Chaowei Xiao
AAML
VLM
88
4
0
17 May 2024
Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs
Valeriia Cherepanova
James Zou
AAML
102
6
0
26 Apr 2024
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
Anselm Paulus
Arman Zharmagambetov
Chuan Guo
Brandon Amos
Yuandong Tian
AAML
142
67
0
21 Apr 2024
GenFighter: A Generative and Evolutive Textual Attack Removal
Md Athikul Islam
Edoardo Serra
Sushil Jajodia
AAML
38
0
0
17 Apr 2024
JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models
Yingchaojie Feng
Zhizhang Chen
Zhining Kang
Sijia Wang
Haoyu Tian
Wei Zhang
Minfeng Zhu
Wei Chen
116
4
0
12 Apr 2024
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Maksym Andriushchenko
Francesco Croce
Nicolas Flammarion
AAML
204
222
0
02 Apr 2024
Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack
M. Russinovich
Ahmed Salem
Ronen Eldan
118
98
0
02 Apr 2024
Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
Yutong He
Alexander Robey
Naoki Murata
Yiding Jiang
J. Williams
George Pappas
Hamed Hassani
Yuki Mitsufuji
Ruslan Salakhutdinov
J. Zico Kolter
DiffM
151
5
0
28 Mar 2024
Optimization-based Prompt Injection Attack to LLM-as-a-Judge
Jiawen Shi
Zenghui Yuan
Yinuo Liu
Yue Huang
Pan Zhou
Lichao Sun
Neil Zhenqiang Gong
AAML
146
57
0
26 Mar 2024
Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
Yifan Li
Hangyu Guo
Kun Zhou
Wayne Xin Zhao
Ji-Rong Wen
132
56
0
14 Mar 2024
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?
Egor Zverev
Sahar Abdelnabi
Soroush Tabesh
Mario Fritz
Christoph H. Lampert
117
27
0
11 Mar 2024
Here Comes The AI Worm: Unleashing Zero-click Worms that Target GenAI-Powered Applications
Stav Cohen
Ron Bitton
Ben Nassi
97
24
0
05 Mar 2024
ImgTrojan: Jailbreaking Vision-Language Models with ONE Image
Xijia Tao
Shuai Zhong
Lei Li
Qi Liu
Lingpeng Kong
133
30
0
05 Mar 2024
Round Trip Translation Defence against Large Language Model Jailbreaking Attacks
Canaan Yung
H. M. Dolatabadi
S. Erfani
Christopher Leckie
AAML
122
5
0
21 Feb 2024
Is the System Message Really Important to Jailbreaks in Large Language Models?
Xiaotian Zou
Yongkang Chen
Ke Li
81
14
0
20 Feb 2024
Defending Jailbreak Prompts via In-Context Adversarial Game
Yujun Zhou
Yufei Han
Haomin Zhuang
Kehan Guo
Zhenwen Liang
Hongyan Bao
Xiangliang Zhang
LLMAG
AAML
117
15
0
20 Feb 2024
OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models
Yuxuan Kuang
Hai Lin
Meng Jiang
LM&Ro
103
33
0
16 Feb 2024
PAL: Proxy-Guided Black-Box Attack on Large Language Models
Chawin Sitawarin
Norman Mu
David Wagner
Alexandre Araujo
ELM
81
35
0
15 Feb 2024
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
Leo Schwinn
David Dobre
Sophie Xhonneux
Gauthier Gidel
Stephan Gunnemann
AAML
157
49
0
14 Feb 2024
Attacking Large Language Models with Projected Gradient Descent
Simon Geisler
Tom Wollschlager
M. H. I. Abdalla
Johannes Gasteiger
Stephan Günnemann
AAML
SILM
135
62
0
14 Feb 2024
GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models
Haibo Jin
Ruoxi Chen
Peiyan Zhang
Andy Zhou
Yang Zhang
Haohan Wang
LLMAG
108
28
0
05 Feb 2024
Weak-to-Strong Jailbreaking on Large Language Models
Xuandong Zhao
Xianjun Yang
Tianyu Pang
Chao Du
Lei Li
Yu-Xiang Wang
William Y. Wang
138
62
0
30 Jan 2024
Black-Box Access is Insufficient for Rigorous AI Audits
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
147
95
0
25 Jan 2024
Large Language Models are Null-Shot Learners
Pittawat Taveekitworachai
Febri Abdullah
R. Thawonmas
LRM
44
2
0
16 Jan 2024
MetaAID 2.5: A Secure Framework for Developing Metaverse Applications via Large Language Models
Hongyin Zhu
80
6
0
22 Dec 2023
Bypassing the Safety Training of Open-Source LLMs with Priming Attacks
Jason Vega
Isha Chaudhary
Changming Xu
Gagandeep Singh
AAML
86
24
0
19 Dec 2023
Hijacking Large Language Models via Adversarial In-Context Learning
Yao Qiang
Xiangyu Zhou
Saleh Zare Zade
Prashant Khanduri
Dongxiao Zhu
116
35
0
16 Nov 2023
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
Yichen Gong
Delong Ran
Jinyuan Liu
Conglei Wang
Tianshuo Cong
Anyu Wang
Sisi Duan
Xiaoyun Wang
MLLM
235
161
0
09 Nov 2023
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
Alexander Robey
Eric Wong
Hamed Hassani
George J. Pappas
AAML
202
260
0
05 Oct 2023
Previous
1
2
3
4