Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.12815
Cited By
Formalizing and Benchmarking Prompt Injection Attacks and Defenses
19 October 2023
Yupei Liu
Yuqi Jia
Runpeng Geng
Jinyuan Jia
Neil Zhenqiang Gong
SILM
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Formalizing and Benchmarking Prompt Injection Attacks and Defenses"
50 / 51 papers shown
Title
SecReEvalBench: A Multi-turned Security Resilience Evaluation Benchmark for Large Language Models
Huining Cui
Wei Liu
AAML
ELM
28
0
0
12 May 2025
Practical Reasoning Interruption Attacks on Reasoning Large Language Models
Yu Cui
Cong Zuo
SILM
AAML
LRM
29
0
0
10 May 2025
Defending against Indirect Prompt Injection by Instruction Detection
Tongyu Wen
Chenglong Wang
Xiyuan Yang
Haoyu Tang
Yueqi Xie
Lingjuan Lyu
Zhicheng Dou
Fangzhao Wu
AAML
31
0
0
08 May 2025
Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs
Chetan Pathade
AAML
SILM
59
0
0
07 May 2025
OET: Optimization-based prompt injection Evaluation Toolkit
Jinsheng Pan
Xiaogeng Liu
Chaowei Xiao
AAML
69
0
0
01 May 2025
Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction
Y. Chen
Haoran Li
Yuan Sui
Yi Liu
Yufei He
Yangqiu Song
Bryan Hooi
AAML
SILM
63
0
0
29 Apr 2025
Prompt Injection Attack to Tool Selection in LLM Agents
Jiawen Shi
Zenghui Yuan
Guiyao Tie
Pan Zhou
Neil Zhenqiang Gong
Lichao Sun
LLMAG
51
0
0
28 Apr 2025
Exploring the Role of Large Language Models in Cybersecurity: A Systematic Survey
Shuang Tian
Tao Zhang
Jiaheng Liu
Jiacheng Wang
Xuangou Wu
...
Ruichen Zhang
W. Zhang
Zhenhui Yuan
Shiwen Mao
Dong In Kim
57
0
0
22 Apr 2025
WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks
Ivan Evtimov
Arman Zharmagambetov
Aaron Grattafiori
Chuan Guo
Kamalika Chaudhuri
AAML
35
0
0
22 Apr 2025
Manipulating Multimodal Agents via Cross-Modal Prompt Injection
Le Wang
Zonghao Ying
Tianyuan Zhang
Siyuan Liang
Shengshan Hu
Mingchuan Zhang
A. Liu
Xianglong Liu
AAML
33
1
0
19 Apr 2025
DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks
Yupei Liu
Yuqi Jia
Jinyuan Jia
Dawn Song
Neil Zhenqiang Gong
AAML
38
0
0
15 Apr 2025
You've Changed: Detecting Modification of Black-Box Large Language Models
Alden Dima
James R. Foulds
Shimei Pan
Philip G. Feldman
35
0
0
14 Apr 2025
Understanding Users' Security and Privacy Concerns and Attitudes Towards Conversational AI Platforms
Mutahar Ali
Arjun Arunasalam
Habiba Farrukh
SILM
54
0
0
09 Apr 2025
Frontier AI's Impact on the Cybersecurity Landscape
Wenbo Guo
Yujin Potter
Tianneng Shi
Zhun Wang
Andy Zhang
Dawn Song
52
1
0
07 Apr 2025
Practical Poisoning Attacks against Retrieval-Augmented Generation
Baolei Zhang
Y. Chen
Minghong Fang
Zhuqing Liu
Lihai Nie
Tong Li
Zheli Liu
SILM
AAML
64
0
0
04 Apr 2025
Prompt Injection Detection and Mitigation via AI Multi-Agent NLP Frameworks
Diego Gosmar
Deborah A. Dahl
Dario Gosmar
AAML
53
0
0
14 Mar 2025
Align in Depth: Defending Jailbreak Attacks via Progressive Answer Detoxification
Yingjie Zhang
Tong Liu
Zhe Zhao
Guozhu Meng
Kai Chen
AAML
53
1
0
14 Mar 2025
RTBAS: Defending LLM Agents Against Prompt Injection and Privacy Leakage
Peter Yong Zhong
Siyuan Chen
Ruiqi Wang
McKenna McCall
Ben L. Titzer
Heather Miller
Phillip B. Gibbons
LLMAG
90
3
0
17 Feb 2025
OverThink: Slowdown Attacks on Reasoning LLMs
A. Kumar
Jaechul Roh
A. Naseh
Marzena Karpinska
Mohit Iyyer
Amir Houmansadr
Eugene Bagdasarian
LRM
64
14
0
04 Feb 2025
Peering Behind the Shield: Guardrail Identification in Large Language Models
Ziqing Yang
Yixin Wu
Rui Wen
Michael Backes
Yang Zhang
63
1
0
03 Feb 2025
Breaking Focus: Contextual Distraction Curse in Large Language Models
Yue Huang
Yanbo Wang
Zixiang Xu
Chujie Gao
Siyuan Wu
Jiayi Ye
Xiuying Chen
Pin-Yu Chen
Xuzhi Zhang
AAML
48
1
0
03 Feb 2025
An Empirically-grounded tool for Automatic Prompt Linting and Repair: A Case Study on Bias, Vulnerability, and Optimization in Developer Prompts
Dhia Elhaq Rzig
Dhruba Jyoti Paul
Kaiser Pister
Jordan Henkel
Foyzul Hassan
75
0
0
21 Jan 2025
RAG-Thief: Scalable Extraction of Private Data from Retrieval-Augmented Generation Applications with Agent-based Attacks
Changyue Jiang
Xudong Pan
Geng Hong
Chenfu Bao
Min Yang
SILM
75
9
0
21 Nov 2024
SoK: Unifying Cybersecurity and Cybersafety of Multimodal Foundation Models with an Information Theory Approach
Ruoxi Sun
Jiamin Chang
Hammond Pearce
Chaowei Xiao
B. Li
Qi Wu
Surya Nepal
Minhui Xue
35
0
0
17 Nov 2024
The Best Defense is a Good Offense: Countering LLM-Powered Cyberattacks
Daniel Ayzenshteyn
Roy Weiss
Yisroel Mirsky
AAML
31
0
0
20 Oct 2024
Are You Human? An Adversarial Benchmark to Expose LLMs
Gilad Gressel
Rahul Pankajakshan
Yisroel Mirsky
DeLMO
38
0
0
12 Oct 2024
Non-Halting Queries: Exploiting Fixed Points in LLMs
Ghaith Hammouri
Kemal Derya
B. Sunar
33
0
0
08 Oct 2024
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents
Hanrong Zhang
Jingyuan Huang
Kai Mei
Yifei Yao
Zhenting Wang
Chenlu Zhan
Hongwei Wang
Yongfeng Zhang
AAML
LLMAG
ELM
51
20
0
03 Oct 2024
Recent Advances in Attack and Defense Approaches of Large Language Models
Jing Cui
Yishi Xu
Zhewei Huang
Shuchang Zhou
Jianbin Jiao
Junge Zhang
PILM
AAML
54
1
0
05 Sep 2024
AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents
Edoardo Debenedetti
Jie Zhang
Mislav Balunović
Luca Beurer-Kellner
Marc Fischer
Florian Tramèr
LLMAG
AAML
53
26
1
19 Jun 2024
Threat Modelling and Risk Analysis for Large Language Model (LLM)-Powered Applications
Stephen Burabari Tete
36
7
0
16 Jun 2024
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Edoardo Debenedetti
Javier Rando
Daniel Paleka
Silaghi Fineas Florin
Dragos Albastroiu
...
Stefan Kraft
Mario Fritz
Florian Tramèr
Sahar Abdelnabi
Lea Schonherr
59
10
0
12 Jun 2024
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
Xunguang Wang
Daoyuan Wu
Zhenlan Ji
Zongjie Li
Pingchuan Ma
Shuai Wang
Yingjiu Li
Yang Liu
Ning Liu
Juergen Rahmel
AAML
76
8
0
08 Jun 2024
Is My Data in Your Retrieval Database? Membership Inference Attacks Against Retrieval Augmented Generation
Maya Anderson
Guy Amit
Abigail Goldsteen
AAML
53
13
0
30 May 2024
Voice Jailbreak Attacks Against GPT-4o
Xinyue Shen
Yixin Wu
Michael Backes
Yang Zhang
AuLLM
40
9
0
29 May 2024
Semantic-guided Prompt Organization for Universal Goal Hijacking against LLMs
Yihao Huang
Chong Wang
Xiaojun Jia
Qing-Wu Guo
Felix Juefei Xu
Jian Zhang
G. Pu
Yang Liu
36
9
0
23 May 2024
Hidden You Malicious Goal Into Benign Narratives: Jailbreak Large Language Models through Logic Chain Injection
Zhilong Wang
Yebo Cao
Peng Liu
23
3
0
07 Apr 2024
Learn to Disguise: Avoid Refusal Responses in LLM's Defense via a Multi-agent Attacker-Disguiser Game
Qianqiao Xu
Zhiliang Tian
Hongyan Wu
Zhen Huang
Yiping Song
Feng Liu
Dongsheng Li
LLMAG
AAML
36
2
0
03 Apr 2024
Dialectical Alignment: Resolving the Tension of 3H and Security Threats of LLMs
Shu Yang
Jiayuan Su
Han Jiang
Mengdi Li
Keyuan Cheng
Muhammad Asif Ali
Lijie Hu
Di Wang
35
5
0
30 Mar 2024
Large language models in 6G security: challenges and opportunities
Tri Nguyen
Huong Nguyen
Ahmad Ijaz
Saeid Sheikhi
Athanasios V. Vasilakos
Panos Kostakos
ELM
28
8
0
18 Mar 2024
Automatic and Universal Prompt Injection Attacks against Large Language Models
Xiaogeng Liu
Zhiyuan Yu
Yizhe Zhang
Ning Zhang
Chaowei Xiao
SILM
AAML
43
33
0
07 Mar 2024
LLMs Can Defend Themselves Against Jailbreaking in a Practical Manner: A Vision Paper
Daoyuan Wu
Shuaibao Wang
Yang Liu
Ning Liu
AAML
39
7
0
24 Feb 2024
PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models
Wei Zou
Runpeng Geng
Binghui Wang
Jinyuan Jia
SILM
39
16
1
12 Feb 2024
Whispers in the Machine: Confidentiality in LLM-integrated Systems
Jonathan Evertz
Merlin Chlosta
Lea Schonherr
Thorsten Eisenhofer
74
17
0
10 Feb 2024
StruQ: Defending Against Prompt Injection with Structured Queries
Sizhe Chen
Julien Piet
Chawin Sitawarin
David A. Wagner
SILM
AAML
30
65
0
09 Feb 2024
Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks Against LLM-Integrated Applications
Xuchen Suo
AAML
SILM
23
26
0
15 Jan 2024
Jatmo: Prompt Injection Defense by Task-Specific Finetuning
Julien Piet
Maha Alrashed
Chawin Sitawarin
Sizhe Chen
Zeming Wei
Elizabeth Sun
Basel Alomair
David A. Wagner
AAML
SyDa
77
52
0
29 Dec 2023
Identifying and Mitigating Vulnerabilities in LLM-Integrated Applications
Fengqing Jiang
Zhangchen Xu
Luyao Niu
Wei Ping
Jinyuan Jia
Bo Li
Radha Poovendran
AAML
19
19
0
07 Nov 2023
Poisoning Language Models During Instruction Tuning
Alexander Wan
Eric Wallace
Sheng Shen
Dan Klein
SILM
92
124
0
01 May 2023
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
290
1,815
0
14 Dec 2020
1
2
Next