ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.16873
  4. Cited By
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
v1v2 (latest)

AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs

21 April 2024
Anselm Paulus
Arman Zharmagambetov
Chuan Guo
Brandon Amos
Yuandong Tian
    AAML
ArXiv (abs)PDFHTML

Papers citing "AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs"

48 / 48 papers shown
Title
Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs
Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs
Hiroshi Matsuda
Chunpeng Ma
Masayuki Asahara
101
0
0
11 Jun 2025
Beyond Jailbreaks: Revealing Stealthier and Broader LLM Security Risks Stemming from Alignment Failures
Beyond Jailbreaks: Revealing Stealthier and Broader LLM Security Risks Stemming from Alignment Failures
Yukai Zhou
Sibei Yang
Wenjie Wang
AAML
21
0
0
09 Jun 2025
LARGO: Latent Adversarial Reflection through Gradient Optimization for Jailbreaking LLMs
LARGO: Latent Adversarial Reflection through Gradient Optimization for Jailbreaking LLMs
Ran Li
Hao Wang
Chengzhi Mao
AAML
95
1
0
16 May 2025
XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs
XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs
Marco Arazzi
Vignesh Kumar Kembu
Antonino Nocera
V. P.
175
0
0
30 Apr 2025
JailbreaksOverTime: Detecting Jailbreak Attacks Under Distribution Shift
JailbreaksOverTime: Detecting Jailbreak Attacks Under Distribution Shift
Julien Piet
Xiao Huang
Dennis Jacob
Annabella Chow
Maha Alrashed
Geng Zhao
Zhanhao Hu
Chawin Sitawarin
Basel Alomair
David Wagner
AAML
137
1
0
28 Apr 2025
Adversarial Attacks on LLM-as-a-Judge Systems: Insights from Prompt Injections
Adversarial Attacks on LLM-as-a-Judge Systems: Insights from Prompt Injections
Narek Maloyan
Dmitry Namiot
SILMAAMLELM
121
0
0
25 Apr 2025
WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks
WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks
Ivan Evtimov
Arman Zharmagambetov
Aaron Grattafiori
Chuan Guo
Kamalika Chaudhuri
AAML
118
4
0
22 Apr 2025
RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search
RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search
Quy-Anh Dang
Chris Ngo
Truong-Son Hy
AAMLSyDa
93
0
0
21 Apr 2025
AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents
AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents
Arman Zharmagambetov
Chuan Guo
Ivan Evtimov
Maya Pavlova
Ruslan Salakhutdinov
Kamalika Chaudhuri
LLMAG
147
8
0
12 Mar 2025
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Thomas Winninger
Boussad Addad
Katarzyna Kapusta
AAML
146
1
0
08 Mar 2025
Jailbreaking Safeguarded Text-to-Image Models via Large Language Models
Zhengyuan Jiang
Yuepeng Hu
Yue Yang
Yinzhi Cao
Neil Zhenqiang Gong
90
1
0
03 Mar 2025
Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models
Meghana Arakkal Rajeev
Rajkumar Ramamurthy
Prapti Trivedi
Vikas Yadav
Oluwanifemi Bamgbose
Sathwik Tejaswi Madhusudan
James Zou
Nazneen Rajani
AAMLLRM
92
3
0
03 Mar 2025
GuidedBench: Measuring and Mitigating the Evaluation Discrepancies of In-the-wild LLM Jailbreak Methods
GuidedBench: Measuring and Mitigating the Evaluation Discrepancies of In-the-wild LLM Jailbreak Methods
Ruixuan Huang
Xunguang Wang
Zongjie Li
Daoyuan Wu
Shuai Wang
ALMELM
142
0
0
24 Feb 2025
KDA: A Knowledge-Distilled Attacker for Generating Diverse Prompts to Jailbreak LLMs
KDA: A Knowledge-Distilled Attacker for Generating Diverse Prompts to Jailbreak LLMs
Buyun Liang
Kwan Ho Ryan Chan
D. Thaker
Jinqi Luo
René Vidal
AAML
103
0
0
05 Feb 2025
MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue
MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue
Fengxiang Wang
Ranjie Duan
Peng Xiao
Xiaojun Jia
Shiji Zhao
...
Hang Su
Jialing Tao
Hui Xue
Jun Zhu
Hui Xue
LLMAG
97
10
0
08 Jan 2025
GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search
GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search
Matan Ben-Tov
Mahmood Sharif
RALM
211
1
0
31 Dec 2024
DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak
DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak
Hao Wang
Hao Li
Junda Zhu
Xinyuan Wang
Changzai Pan
Minlie Huang
Lei Sha
355
0
0
23 Dec 2024
Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context
Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context
Nilanjana Das
Edward Raff
Aman Chadha
Manas Gaur
AAML
234
1
0
20 Dec 2024
Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under
  Misleading Scenarios
Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios
Yunkai Dang
Mengxi Gao
Yibo Yan
Xin Zou
Yanggan Gu
Aiwei Liu
Xuming Hu
92
6
0
05 Nov 2024
RobustKV: Defending Large Language Models against Jailbreak Attacks via
  KV Eviction
RobustKV: Defending Large Language Models against Jailbreak Attacks via KV Eviction
Tanqiu Jiang
Zian Wang
Jiacheng Liang
Changjiang Li
Yuhui Wang
Ting Wang
AAML
85
6
0
25 Oct 2024
Deciphering the Chaos: Enhancing Jailbreak Attacks via Adversarial Prompt Translation
Deciphering the Chaos: Enhancing Jailbreak Attacks via Adversarial Prompt Translation
Qizhang Li
Xiaochen Yang
W. Zuo
Yiwen Guo
AAML
149
1
0
15 Oct 2024
JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent
  Enhanced Explanation Evaluation Framework
JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent Enhanced Explanation Evaluation Framework
Fan Liu
Yue Feng
Zhao Xu
Lixin Su
Xinyu Ma
D. Yin
Hao Liu
ELM
107
15
0
11 Oct 2024
Recent advancements in LLM Red-Teaming: Techniques, Defenses, and
  Ethical Considerations
Recent advancements in LLM Red-Teaming: Techniques, Defenses, and Ethical Considerations
Tarun Raheja
Nilay Pochhi
AAML
100
1
0
09 Oct 2024
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min Lin
89
13
0
09 Oct 2024
Harnessing Task Overload for Scalable Jailbreak Attacks on Large
  Language Models
Harnessing Task Overload for Scalable Jailbreak Attacks on Large Language Models
Yiting Dong
Guobin Shen
Dongcheng Zhao
Xiang He
Yi Zeng
75
2
0
05 Oct 2024
Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models
Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models
Guobin Shen
Dongcheng Zhao
Yiting Dong
Xiang He
Yi Zeng
AAML
120
4
0
03 Oct 2024
Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
Maya Pavlova
Erik Brinkman
Krithika Iyer
Vítor Albiero
Joanna Bitton
Hailey Nguyen
Jingkai Li
Cristian Canton Ferrer
Ivan Evtimov
Aaron Grattafiori
ALM
72
12
0
02 Oct 2024
FlipAttack: Jailbreak LLMs via Flipping
FlipAttack: Jailbreak LLMs via Flipping
Yue Liu
Xiaoxin He
Miao Xiong
Jinlan Fu
Shumin Deng
Bryan Hooi
AAML
102
17
0
02 Oct 2024
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
Seanie Lee
Haebin Seong
Dong Bok Lee
Minki Kang
Xiaoyin Chen
Dominik Wagner
Yoshua Bengio
Juho Lee
Sung Ju Hwang
242
6
0
02 Oct 2024
Generated Data with Fake Privacy: Hidden Dangers of Fine-tuning Large Language Models on Generated Data
Generated Data with Fake Privacy: Hidden Dangers of Fine-tuning Large Language Models on Generated Data
Atilla Akkus
Mingjie Li
Junjie Chu
Junjie Chu
Michael Backes
Sinem Sav
Sinem Sav
SILMSyDa
130
4
0
12 Sep 2024
Recent Advances in Attack and Defense Approaches of Large Language
  Models
Recent Advances in Attack and Defense Approaches of Large Language Models
Jing Cui
Yishi Xu
Zhewei Huang
Shuchang Zhou
Jianbin Jiao
Junge Zhang
PILMAAML
135
2
0
05 Sep 2024
Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models
Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models
Bang An
Sicheng Zhu
Ruiyi Zhang
Michael-Andrei Panaitescu-Liess
Yuancheng Xu
Furong Huang
AAML
140
18
0
01 Sep 2024
Legilimens: Practical and Unified Content Moderation for Large Language
  Model Services
Legilimens: Practical and Unified Content Moderation for Large Language Model Services
Jialin Wu
Jiangyi Deng
Shengyuan Pang
Yanjiao Chen
Jiayang Xu
Xinfeng Li
Wei Dong
137
8
0
28 Aug 2024
Towards Robust Knowledge Unlearning: An Adversarial Framework for
  Assessing and Improving Unlearning Robustness in Large Language Models
Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models
Hongbang Yuan
Zhuoran Jin
Pengfei Cao
Yubo Chen
Kang Liu
Jun Zhao
AAMLELMMU
90
9
0
20 Aug 2024
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
Jingtong Su
Mingyu Lee
SangKeun Lee
93
12
0
02 Aug 2024
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge
  Bases
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases
Zhaorun Chen
Zhen Xiang
Chaowei Xiao
Dawn Song
Bo Li
LLMAGAAML
105
79
0
17 Jul 2024
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
Youliang Yuan
Wenxiang Jiao
Wenxuan Wang
Jen-tse Huang
Jiahao Xu
Tian Liang
Pinjia He
Zhaopeng Tu
117
32
0
12 Jul 2024
AI Safety in Generative AI Large Language Models: A Survey
AI Safety in Generative AI Large Language Models: A Survey
Jaymari Chua
Yun Yvonna Li
Shiyi Yang
Chen Wang
Lina Yao
LM&MA
102
19
0
06 Jul 2024
Jailbreak Attacks and Defenses Against Large Language Models: A Survey
Jailbreak Attacks and Defenses Against Large Language Models: A Survey
Sibo Yi
Yule Liu
Zhen Sun
Tianshuo Cong
Xinlei He
Jiaxing Song
Ke Xu
Qi Li
AAML
124
111
0
05 Jul 2024
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs
Zhao Xu
Fan Liu
Hao Liu
AAML
126
16
0
13 Jun 2024
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
Xunguang Wang
Daoyuan Wu
Zhenlan Ji
Zongjie Li
Pingchuan Ma
Shuai Wang
Yingjiu Li
Yang Liu
Ning Liu
Juergen Rahmel
AAML
194
14
0
08 Jun 2024
Adversarial Tuning: Defending Against Jailbreak Attacks for LLMs
Adversarial Tuning: Defending Against Jailbreak Attacks for LLMs
Fan Liu
Zhao Xu
Hao Liu
AAML
134
13
0
07 Jun 2024
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models
  and Their Defenses
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min Lin
AAML
142
42
0
03 Jun 2024
Improved Techniques for Optimization-Based Jailbreaking on Large
  Language Models
Improved Techniques for Optimization-Based Jailbreaking on Large Language Models
Xiaojun Jia
Tianyu Pang
Chao Du
Yihao Huang
Jindong Gu
Yang Liu
Xiaochun Cao
Min Lin
AAML
104
41
0
31 May 2024
Efficient Adversarial Training in LLMs with Continuous Attacks
Efficient Adversarial Training in LLMs with Continuous Attacks
Sophie Xhonneux
Alessandro Sordoni
Stephan Günnemann
Gauthier Gidel
Leo Schwinn
AAML
145
56
0
24 May 2024
Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation
Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation
Yuxi Li
Yi Liu
Yuekang Li
Ling Shi
Gelei Deng
Shengquan Chen
Kailong Wang
124
12
0
20 May 2024
Don't Say No: Jailbreaking LLM by Suppressing Refusal
Don't Say No: Jailbreaking LLM by Suppressing Refusal
Yukai Zhou
Jian Lou
Zhijie Huang
Zhan Qin
Yibei Yang
Wenjie Wang
AAML
116
19
0
25 Apr 2024
AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks
AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks
Yifan Zeng
Yiran Wu
Xiao Zhang
Huazheng Wang
Qingyun Wu
LLMAGAAML
90
77
0
02 Mar 2024
1