Query-Based Adversarial Prompt Generation

Query-Based Adversarial Prompt Generation

19 February 2024

Jonathan Hayase

Nicholas Carlini

Florian Tramèr

Papers citing "Query-Based Adversarial Prompt Generation"

16 / 16 papers shown

Title
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models Thomas Winninger Boussad Addad Katarzyna Kapusta AAML 91 1 0 08 Mar 2025
Smoothed Embeddings for Robust Language Models Ryo Hase Md Rafi Ur Rashid Ashley Lewis Jing Liu T. Koike-Akino K. Parsons Yanjie Wang AAML 83 2 0 27 Jan 2025
Does Refusal Training in LLMs Generalize to the Past Tense? Maksym Andriushchenko Nicolas Flammarion 99 33 0 16 Jul 2024
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks Maksym Andriushchenko Francesco Croce Nicolas Flammarion AAML 142 203 0 02 Apr 2024
PAL: Proxy-Guided Black-Box Attack on Large Language Models Chawin Sitawarin Norman Mu David Wagner Alexandre Araujo ELM 49 32 0 15 Feb 2024
GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models Haibo Jin Ruoxi Chen Peiyan Zhang Andy Zhou Yang Zhang Haohan Wang LLMAG 64 25 0 05 Feb 2024
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts Jiahao Yu Xingwei Lin Zheng Yu Xinyu Xing SILM 159 340 0 19 Sep 2023
Image Hijacks: Adversarial Images can Control Generative Models at Runtime Luke Bailey Euan Ong Stuart J. Russell Scott Emmons VLM MLLM 54 84 0 01 Sep 2023
Universal and Transferable Adversarial Attacks on Aligned Language Models Andy Zou Zifan Wang Nicholas Carlini Milad Nasr J. Zico Kolter Matt Fredrikson 248 1,436 0 27 Jul 2023
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection Kai Greshake Sahar Abdelnabi Shailesh Mishra C. Endres Thorsten Holz Mario Fritz SILM 108 483 0 23 Feb 2023
TextBugger: Generating Adversarial Text Against Real-world Applications Jinfeng Li S. Ji Tianyu Du Bo Li Ting Wang SILM AAML 179 737 0 13 Dec 2018
Evasion Attacks against Machine Learning at Test Time Battista Biggio Igino Corona Davide Maiorca B. Nelson Nedim Srndic Pavel Laskov Giorgio Giacinto Fabio Roli AAML 117 2,145 0 21 Aug 2017
Towards Deep Learning Models Resistant to Adversarial Attacks Aleksander Madry Aleksandar Makelov Ludwig Schmidt Dimitris Tsipras Adrian Vladu SILM OOD 259 12,029 0 19 Jun 2017
Delving into Transferable Adversarial Examples and Black-box Attacks Yanpei Liu Xinyun Chen Chang-rui Liu D. Song AAML 133 1,731 0 08 Nov 2016
Towards Evaluating the Robustness of Neural Networks Nicholas Carlini D. Wagner OOD AAML 214 8,533 0 16 Aug 2016
Intriguing properties of neural networks Christian Szegedy Wojciech Zaremba Ilya Sutskever Joan Bruna D. Erhan Ian Goodfellow Rob Fergus AAML 229 14,893 1 21 Dec 2013