ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.12329
  4. Cited By
Query-Based Adversarial Prompt Generation

Query-Based Adversarial Prompt Generation

19 February 2024
Jonathan Hayase
Ema Borevkovic
Nicholas Carlini
Florian Tramèr
Milad Nasr
    AAML
    SILM
ArXivPDFHTML

Papers citing "Query-Based Adversarial Prompt Generation"

16 / 16 papers shown
Title
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Thomas Winninger
Boussad Addad
Katarzyna Kapusta
AAML
91
1
0
08 Mar 2025
Smoothed Embeddings for Robust Language Models
Smoothed Embeddings for Robust Language Models
Ryo Hase
Md Rafi Ur Rashid
Ashley Lewis
Jing Liu
T. Koike-Akino
K. Parsons
Yanjie Wang
AAML
83
2
0
27 Jan 2025
Does Refusal Training in LLMs Generalize to the Past Tense?
Does Refusal Training in LLMs Generalize to the Past Tense?
Maksym Andriushchenko
Nicolas Flammarion
99
33
0
16 Jul 2024
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Maksym Andriushchenko
Francesco Croce
Nicolas Flammarion
AAML
142
203
0
02 Apr 2024
PAL: Proxy-Guided Black-Box Attack on Large Language Models
PAL: Proxy-Guided Black-Box Attack on Large Language Models
Chawin Sitawarin
Norman Mu
David Wagner
Alexandre Araujo
ELM
49
32
0
15 Feb 2024
GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models
GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models
Haibo Jin
Ruoxi Chen
Peiyan Zhang
Andy Zhou
Yang Zhang
Haohan Wang
LLMAG
64
25
0
05 Feb 2024
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated
  Jailbreak Prompts
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
159
340
0
19 Sep 2023
Image Hijacks: Adversarial Images can Control Generative Models at
  Runtime
Image Hijacks: Adversarial Images can Control Generative Models at Runtime
Luke Bailey
Euan Ong
Stuart J. Russell
Scott Emmons
VLM
MLLM
54
84
0
01 Sep 2023
Universal and Transferable Adversarial Attacks on Aligned Language
  Models
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
248
1,436
0
27 Jul 2023
Not what you've signed up for: Compromising Real-World LLM-Integrated
  Applications with Indirect Prompt Injection
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Kai Greshake
Sahar Abdelnabi
Shailesh Mishra
C. Endres
Thorsten Holz
Mario Fritz
SILM
108
483
0
23 Feb 2023
TextBugger: Generating Adversarial Text Against Real-world Applications
TextBugger: Generating Adversarial Text Against Real-world Applications
Jinfeng Li
S. Ji
Tianyu Du
Bo Li
Ting Wang
SILM
AAML
179
737
0
13 Dec 2018
Evasion Attacks against Machine Learning at Test Time
Evasion Attacks against Machine Learning at Test Time
Battista Biggio
Igino Corona
Davide Maiorca
B. Nelson
Nedim Srndic
Pavel Laskov
Giorgio Giacinto
Fabio Roli
AAML
117
2,145
0
21 Aug 2017
Towards Deep Learning Models Resistant to Adversarial Attacks
Towards Deep Learning Models Resistant to Adversarial Attacks
Aleksander Madry
Aleksandar Makelov
Ludwig Schmidt
Dimitris Tsipras
Adrian Vladu
SILM
OOD
259
12,029
0
19 Jun 2017
Delving into Transferable Adversarial Examples and Black-box Attacks
Delving into Transferable Adversarial Examples and Black-box Attacks
Yanpei Liu
Xinyun Chen
Chang-rui Liu
D. Song
AAML
133
1,731
0
08 Nov 2016
Towards Evaluating the Robustness of Neural Networks
Towards Evaluating the Robustness of Neural Networks
Nicholas Carlini
D. Wagner
OOD
AAML
214
8,533
0
16 Aug 2016
Intriguing properties of neural networks
Intriguing properties of neural networks
Christian Szegedy
Wojciech Zaremba
Ilya Sutskever
Joan Bruna
D. Erhan
Ian Goodfellow
Rob Fergus
AAML
229
14,893
1
21 Dec 2013
1