ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.09527
  4. Cited By
Ignore Previous Prompt: Attack Techniques For Language Models

Ignore Previous Prompt: Attack Techniques For Language Models

17 November 2022
Fábio Perez
Ian Ribeiro
    SILM
ArXivPDFHTML

Papers citing "Ignore Previous Prompt: Attack Techniques For Language Models"

50 / 284 papers shown
Title
PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization
PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization
Yidan Wang
Yanan Cao
Yubing Ren
Fang Fang
Zheng-Shen Lin
Binxing Fang
PILM
44
0
0
15 May 2025
LM-Scout: Analyzing the Security of Language Model Integration in Android Apps
LM-Scout: Analyzing the Security of Language Model Integration in Android Apps
Muhammad Ibrahim
Gűliz Seray Tuncay
Z. Berkay Celik
Aravind Machiry
Antonio Bianchi
31
0
0
13 May 2025
GRADA: Graph-based Reranker against Adversarial Documents Attack
GRADA: Graph-based Reranker against Adversarial Documents Attack
Jingjie Zheng
Aryo Pradipta Gema
Giwon Hong
Xuanli He
Pasquale Minervini
Youcheng Sun
Qiongkai Xu
31
0
0
12 May 2025
SecReEvalBench: A Multi-turned Security Resilience Evaluation Benchmark for Large Language Models
SecReEvalBench: A Multi-turned Security Resilience Evaluation Benchmark for Large Language Models
Huining Cui
Wei Liu
AAML
ELM
28
0
0
12 May 2025
System Prompt Poisoning: Persistent Attacks on Large Language Models Beyond User Injection
System Prompt Poisoning: Persistent Attacks on Large Language Models Beyond User Injection
Jiawei Guo
Haipeng Cai
SILM
AAML
29
0
0
10 May 2025
AgentXploit: End-to-End Redteaming of Black-Box AI Agents
AgentXploit: End-to-End Redteaming of Black-Box AI Agents
Zhun Wang
Vincent Siu
Zhe Ye
Tianneng Shi
Yuzhou Nie
Xuandong Zhao
Chenguang Wang
Wenbo Guo
Dawn Song
LLMAG
AAML
36
0
0
09 May 2025
Attack and defense techniques in large language models: A survey and new perspectives
Attack and defense techniques in large language models: A survey and new perspectives
Zhiyu Liao
Kang Chen
Yuanguo Lin
Kangkang Li
Yunxuan Liu
Hefeng Chen
Xingwang Huang
Yuanhui Yu
AAML
56
0
0
02 May 2025
The Illusion of Role Separation: Hidden Shortcuts in LLM Role Learning (and How to Fix Them)
The Illusion of Role Separation: Hidden Shortcuts in LLM Role Learning (and How to Fix Them)
Zihao Wang
Yibo Jiang
Jiahao Yu
Heqing Huang
38
0
0
01 May 2025
CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks
CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks
Rui Wang
Junda Wu
Yu Xia
Tong Yu
R. Zhang
Ryan A. Rossi
Lina Yao
Julian McAuley
AAML
SILM
51
0
0
29 Apr 2025
Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction
Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction
Y. Chen
Haoran Li
Yuan Sui
Yijia Liu
Yufei He
Yangqiu Song
Bryan Hooi
AAML
SILM
63
0
0
29 Apr 2025
Prompt Injection Attack to Tool Selection in LLM Agents
Prompt Injection Attack to Tool Selection in LLM Agents
Jiawen Shi
Zenghui Yuan
Guiyao Tie
Pan Zhou
Neil Zhenqiang Gong
Lichao Sun
LLMAG
51
0
0
28 Apr 2025
Adversarial Attacks on LLM-as-a-Judge Systems: Insights from Prompt Injections
Adversarial Attacks on LLM-as-a-Judge Systems: Insights from Prompt Injections
Narek Maloyan
Dmitry Namiot
SILM
AAML
ELM
83
0
0
25 Apr 2025
DoomArena: A framework for Testing AI Agents Against Evolving Security Threats
DoomArena: A framework for Testing AI Agents Against Evolving Security Threats
Léo Boisvert
Mihir Bansal
Chandra Kiran Reddy Evuru
Gabriel Huang
Abhay Puri
...
Quentin Cappart
Jason Stanley
Alexandre Lacoste
Alexandre Drouin
Krishnamurthy Dvijotham
35
0
0
18 Apr 2025
Progent: Programmable Privilege Control for LLM Agents
Progent: Programmable Privilege Control for LLM Agents
Tianneng Shi
Jingxuan He
Zhun Wang
Linyu Wu
Hongwei Li
Wenbo Guo
Dawn Song
LLMAG
39
0
0
16 Apr 2025
DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks
DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks
Yupei Liu
Yuqi Jia
Jinyuan Jia
Dawn Song
Neil Zhenqiang Gong
AAML
41
0
0
15 Apr 2025
StruPhantom: Evolutionary Injection Attacks on Black-Box Tabular Agents Powered by Large Language Models
StruPhantom: Evolutionary Injection Attacks on Black-Box Tabular Agents Powered by Large Language Models
Yang Feng
Xudong Pan
AAML
36
0
0
14 Apr 2025
You've Changed: Detecting Modification of Black-Box Large Language Models
You've Changed: Detecting Modification of Black-Box Large Language Models
Alden Dima
James R. Foulds
Shimei Pan
Philip G. Feldman
35
0
0
14 Apr 2025
AttentionDefense: Leveraging System Prompt Attention for Explainable Defense Against Novel Jailbreaks
AttentionDefense: Leveraging System Prompt Attention for Explainable Defense Against Novel Jailbreaks
Charlotte Siska
Anush Sankaran
AAML
45
0
0
10 Apr 2025
PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization
PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization
Yang Jiao
X. Wang
Kai Yang
AAML
SILM
35
0
0
10 Apr 2025
Understanding Users' Security and Privacy Concerns and Attitudes Towards Conversational AI Platforms
Understanding Users' Security and Privacy Concerns and Attitudes Towards Conversational AI Platforms
Mutahar Ali
Arjun Arunasalam
Habiba Farrukh
SILM
54
0
0
09 Apr 2025
Separator Injection Attack: Uncovering Dialogue Biases in Large Language Models Caused by Role Separators
Separator Injection Attack: Uncovering Dialogue Biases in Large Language Models Caused by Role Separators
Xitao Li
Haoran Wang
Jiang Wu
Ting Liu
AAML
26
0
0
08 Apr 2025
A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models
A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models
Carlos Peláez-González
Andrés Herrera-Poyatos
Cristina Zuheros
David Herrera-Poyatos
Virilo Tejedor
F. Herrera
AAML
24
0
0
07 Apr 2025
Practical Poisoning Attacks against Retrieval-Augmented Generation
Practical Poisoning Attacks against Retrieval-Augmented Generation
Baolei Zhang
Y. Chen
Minghong Fang
Zhuqing Liu
Lihai Nie
Tong Li
Zheli Liu
SILM
AAML
64
0
0
04 Apr 2025
Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack
Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack
Murong Yue
Ziyu Yao
SILM
AAML
58
0
0
18 Mar 2025
Multi-Agent Systems Execute Arbitrary Malicious Code
Multi-Agent Systems Execute Arbitrary Malicious Code
Harold Triedman
Rishi Jha
Vitaly Shmatikov
LLMAG
AAML
96
2
0
15 Mar 2025
Align in Depth: Defending Jailbreak Attacks via Progressive Answer Detoxification
Yingjie Zhang
Tong Liu
Zhe Zhao
Guozhu Meng
Kai Chen
AAML
55
1
0
14 Mar 2025
ASIDE: Architectural Separation of Instructions and Data in Language Models
ASIDE: Architectural Separation of Instructions and Data in Language Models
Egor Zverev
Evgenii Kortukov
Alexander Panfilov
Soroush Tabesh
Alexandra Volkova
Sebastian Lapuschkin
Wojciech Samek
Christoph H. Lampert
AAML
54
1
0
13 Mar 2025
Prompt Inference Attack on Distributed Large Language Model Inference Frameworks
Xinjian Luo
Ting Yu
X. Xiao
AAML
SILM
88
1
0
12 Mar 2025
Safety Guardrails for LLM-Enabled Robots
Zachary Ravichandran
Alexander Robey
Vijay Kumar
George Pappas
Hamed Hassani
61
2
0
10 Mar 2025
Adversarial Training for Multimodal Large Language Models against Jailbreak Attacks
Adversarial Training for Multimodal Large Language Models against Jailbreak Attacks
Liming Lu
Shuchao Pang
Siyuan Liang
Haotian Zhu
Xiyu Zeng
Aishan Liu
Yunhuai Liu
Yongbin Zhou
AAML
51
1
0
05 Mar 2025
Adversarial Tokenization
Renato Lui Geh
Zilei Shao
Mathias Niepert
SILM
AAML
87
0
0
04 Mar 2025
UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning
Junzhe Zhang
Shuang Yang
B. Li
AAML
LLMAG
58
0
0
28 Feb 2025
ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments
ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments
Hojae Han
Seung-won Hwang
Rajhans Samdani
Yuxiong He
ALM
70
2
0
27 Feb 2025
Shh, don't say that! Domain Certification in LLMs
Shh, don't say that! Domain Certification in LLMs
Cornelius Emde
Alasdair Paren
Preetham Arvind
Maxime Kayser
Tom Rainforth
Thomas Lukasiewicz
Guohao Li
Philip Torr
Adel Bibi
53
1
0
26 Feb 2025
On the Robustness of Transformers against Context Hijacking for Linear Classification
On the Robustness of Transformers against Context Hijacking for Linear Classification
Tianle Li
Chenyang Zhang
Xingwu Chen
Yuan Cao
Difan Zou
72
0
0
24 Feb 2025
Attention Eclipse: Manipulating Attention to Bypass LLM Safety-Alignment
Attention Eclipse: Manipulating Attention to Bypass LLM Safety-Alignment
Pedram Zaree
Md Abdullah Al Mamun
Quazi Mishkatul Alam
Yue Dong
Ihsen Alouani
Nael B. Abu-Ghazaleh
AAML
41
0
0
24 Feb 2025
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs
Giulio Zizzo
Giandomenico Cornacchia
Kieran Fraser
Muhammad Zaid Hameed
Ambrish Rawat
Beat Buesser
Mark Purcell
Pin-Yu Chen
P. Sattigeri
Kush R. Varshney
AAML
43
2
0
24 Feb 2025
Can Indirect Prompt Injection Attacks Be Detected and Removed?
Can Indirect Prompt Injection Attacks Be Detected and Removed?
Yulin Chen
Haoran Li
Yuan Sui
Yufei He
Yue Liu
Yangqiu Song
Bryan Hooi
AAML
44
3
0
23 Feb 2025
Guardians of the Agentic System: Preventing Many Shots Jailbreak with Agentic System
Guardians of the Agentic System: Preventing Many Shots Jailbreak with Agentic System
Saikat Barua
Mostafizur Rahman
Md Jafor Sadek
Rafiul Islam
Shehnaz Khaled
Ahmedul Kabir
LLMAG
60
1
0
23 Feb 2025
Detecting Phishing Sites Using ChatGPT
Detecting Phishing Sites Using ChatGPT
Takashi Koide
Naoki Fukushi
Hiroki Nakano
Daiki Chiba
82
31
0
17 Feb 2025
OverThink: Slowdown Attacks on Reasoning LLMs
OverThink: Slowdown Attacks on Reasoning LLMs
A. Kumar
Jaechul Roh
A. Naseh
Marzena Karpinska
Mohit Iyyer
Amir Houmansadr
Eugene Bagdasarian
LRM
64
14
0
04 Feb 2025
Topic-FlipRAG: Topic-Orientated Adversarial Opinion Manipulation Attacks to Retrieval-Augmented Generation Models
Topic-FlipRAG: Topic-Orientated Adversarial Opinion Manipulation Attacks to Retrieval-Augmented Generation Models
Jiawei Liu
Zhuo Chen
Miaokun Chen
Fengchang Yu
Fan Zhang
Xiaofeng Wang
Wei Lu
Xiaozhong Liu
AAML
SILM
63
0
0
03 Feb 2025
Riddle Me This! Stealthy Membership Inference for Retrieval-Augmented Generation
Riddle Me This! Stealthy Membership Inference for Retrieval-Augmented Generation
A. Naseh
Yuefeng Peng
Anshuman Suri
Harsh Chaudhari
Alina Oprea
Amir Houmansadr
SILM
AAML
RALM
51
0
0
01 Feb 2025
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models
Jingwei Yi
Yueqi Xie
Bin Zhu
Emre Kiciman
Guangzhong Sun
Xing Xie
Fangzhao Wu
AAML
62
65
0
28 Jan 2025
An Empirically-grounded tool for Automatic Prompt Linting and Repair: A Case Study on Bias, Vulnerability, and Optimization in Developer Prompts
Dhia Elhaq Rzig
Dhruba Jyoti Paul
Kaiser Pister
Jordan Henkel
Foyzul Hassan
80
0
0
21 Jan 2025
Authenticated Delegation and Authorized AI Agents
Authenticated Delegation and Authorized AI Agents
Tobin South
Samuele Marro
Thomas Hardjono
Robert Mahari
Cedric Deslandes Whitney
Dazza Greenwood
Alan Chan
Alex Pentland
52
3
0
17 Jan 2025
Safeguarding System Prompts for LLMs
Safeguarding System Prompts for LLMs
Zhifeng Jiang
Zhihua Jin
Guoliang He
AAML
SILM
105
1
0
10 Jan 2025
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates
Fengqing Jiang
Zhangchen Xu
Luyao Niu
Bill Yuchen Lin
Radha Poovendran
SILM
81
5
0
08 Jan 2025
LLM-Virus: Evolutionary Jailbreak Attack on Large Language Models
Miao Yu
Junfeng Fang
Yingjie Zhou
Xing Fan
Kun Wang
Shirui Pan
Qingsong Wen
AAML
61
0
0
03 Jan 2025
The Task Shield: Enforcing Task Alignment to Defend Against Indirect
  Prompt Injection in LLM Agents
The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents
Feiran Jia
Tong Wu
Xin Qin
Anna Squicciarini
LLMAG
AAML
86
4
0
21 Dec 2024
123456
Next