Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.12329
Cited By
Query-Based Adversarial Prompt Generation
19 February 2024
Jonathan Hayase
Ema Borevkovic
Nicholas Carlini
Florian Tramèr
Milad Nasr
AAML
SILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Query-Based Adversarial Prompt Generation"
16 / 16 papers shown
Title
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Thomas Winninger
Boussad Addad
Katarzyna Kapusta
AAML
91
1
0
08 Mar 2025
Smoothed Embeddings for Robust Language Models
Ryo Hase
Md Rafi Ur Rashid
Ashley Lewis
Jing Liu
T. Koike-Akino
K. Parsons
Yanjie Wang
AAML
83
2
0
27 Jan 2025
Does Refusal Training in LLMs Generalize to the Past Tense?
Maksym Andriushchenko
Nicolas Flammarion
99
33
0
16 Jul 2024
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Maksym Andriushchenko
Francesco Croce
Nicolas Flammarion
AAML
142
203
0
02 Apr 2024
PAL: Proxy-Guided Black-Box Attack on Large Language Models
Chawin Sitawarin
Norman Mu
David Wagner
Alexandre Araujo
ELM
49
32
0
15 Feb 2024
GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models
Haibo Jin
Ruoxi Chen
Peiyan Zhang
Andy Zhou
Yang Zhang
Haohan Wang
LLMAG
64
25
0
05 Feb 2024
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
159
340
0
19 Sep 2023
Image Hijacks: Adversarial Images can Control Generative Models at Runtime
Luke Bailey
Euan Ong
Stuart J. Russell
Scott Emmons
VLM
MLLM
54
84
0
01 Sep 2023
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
248
1,436
0
27 Jul 2023
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Kai Greshake
Sahar Abdelnabi
Shailesh Mishra
C. Endres
Thorsten Holz
Mario Fritz
SILM
108
483
0
23 Feb 2023
TextBugger: Generating Adversarial Text Against Real-world Applications
Jinfeng Li
S. Ji
Tianyu Du
Bo Li
Ting Wang
SILM
AAML
179
737
0
13 Dec 2018
Evasion Attacks against Machine Learning at Test Time
Battista Biggio
Igino Corona
Davide Maiorca
B. Nelson
Nedim Srndic
Pavel Laskov
Giorgio Giacinto
Fabio Roli
AAML
117
2,145
0
21 Aug 2017
Towards Deep Learning Models Resistant to Adversarial Attacks
Aleksander Madry
Aleksandar Makelov
Ludwig Schmidt
Dimitris Tsipras
Adrian Vladu
SILM
OOD
259
12,029
0
19 Jun 2017
Delving into Transferable Adversarial Examples and Black-box Attacks
Yanpei Liu
Xinyun Chen
Chang-rui Liu
D. Song
AAML
133
1,731
0
08 Nov 2016
Towards Evaluating the Robustness of Neural Networks
Nicholas Carlini
D. Wagner
OOD
AAML
214
8,533
0
16 Aug 2016
Intriguing properties of neural networks
Christian Szegedy
Wojciech Zaremba
Ilya Sutskever
Joan Bruna
D. Erhan
Ian Goodfellow
Rob Fergus
AAML
229
14,893
1
21 Dec 2013
1