Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.09527
Cited By
Ignore Previous Prompt: Attack Techniques For Language Models
17 November 2022
Fábio Perez
Ian Ribeiro
SILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Ignore Previous Prompt: Attack Techniques For Language Models"
50 / 284 papers shown
Title
A Comprehensive Survey of Attack Techniques, Implementation, and Mitigation Strategies in Large Language Models
Aysan Esmradi
Daniel Wankit Yip
C. Chan
AAML
38
11
0
18 Dec 2023
Maatphor: Automated Variant Analysis for Prompt Injection Attacks
Ahmed Salem
Andrew J. Paverd
Boris Köpf
32
8
0
12 Dec 2023
Safety Alignment in NLP Tasks: Weakly Aligned Summarization as an In-Context Attack
Yu Fu
Yufei Li
Wen Xiao
Cong Liu
Yue Dong
AAML
42
5
0
12 Dec 2023
METAL: Metamorphic Testing Framework for Analyzing Large-Language Model Qualities
Sangwon Hyun
Mingyu Guo
Muhammad Ali Babar
36
8
0
11 Dec 2023
Analyzing the Inherent Response Tendency of LLMs: Real-World Instructions-Driven Jailbreak
Yanrui Du
Sendong Zhao
Ming Ma
Yuhan Chen
Bing Qin
26
15
0
07 Dec 2023
Dr. Jekyll and Mr. Hyde: Two Faces of LLMs
Matteo Gioele Collu
Tom Janssen-Groesbeek
Stefanos Koffas
Mauro Conti
S. Picek
21
1
0
06 Dec 2023
Unveiling the Implicit Toxicity in Large Language Models
Jiaxin Wen
Pei Ke
Hao Sun
Zhexin Zhang
Chengfei Li
Jinfeng Bai
Minlie Huang
39
25
0
29 Nov 2023
Evil Geniuses: Delving into the Safety of LLM-based Agents
Yu Tian
Xiao Yang
Jingyuan Zhang
Yinpeng Dong
Hang Su
LLMAG
AAML
39
55
0
20 Nov 2023
Assessing Prompt Injection Risks in 200+ Custom GPTs
Jiahao Yu
Yuhang Wu
Dong Shu
Mingyu Jin
Sabrina Yang
Xinyu Xing
22
51
0
20 Nov 2023
Hijacking Large Language Models via Adversarial In-Context Learning
Yao Qiang
Xiangyu Zhou
Dongxiao Zhu
32
32
0
16 Nov 2023
How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities
Lingbo Mo
Boshi Wang
Muhao Chen
Huan Sun
29
27
0
15 Nov 2023
Alignment is not sufficient to prevent large language models from generating harmful information: A psychoanalytic perspective
Zi Yin
Wei Ding
Jia Liu
27
1
0
14 Nov 2023
A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily
Peng Ding
Jun Kuang
Dan Ma
Xuezhi Cao
Yunsen Xian
Jiajun Chen
Shujian Huang
AAML
30
96
0
14 Nov 2023
Identifying and Mitigating Vulnerabilities in LLM-Integrated Applications
Fengqing Jiang
Zhangchen Xu
Luyao Niu
Wei Ping
Jinyuan Jia
Bo Li
Radha Poovendran
AAML
21
19
0
07 Nov 2023
Do LLMs exhibit human-like response biases? A case study in survey design
Lindia Tjuatja
Valerie Chen
Sherry Tongshuang Wu
Ameet Talwalkar
Graham Neubig
32
80
0
07 Nov 2023
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
Sam Toyer
Olivia Watkins
Ethan Mendes
Justin Svegliato
Luke Bailey
...
Karim Elmaaroufi
Pieter Abbeel
Trevor Darrell
Alan Ritter
Stuart J. Russell
21
71
0
02 Nov 2023
The Mystery of In-Context Learning: A Comprehensive Survey on Interpretation and Analysis
Yuxiang Zhou
Jiazheng Li
Yanzheng Xiang
Hanqi Yan
Lin Gui
Yulan He
24
14
0
01 Nov 2023
Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism
Mansi Sakarvadia
Arham Khan
Aswathy Ajith
Daniel Grzenda
Nathaniel Hudson
André Bauer
Kyle Chard
Ian Foster
177
10
0
25 Oct 2023
Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition
Sander Schulhoff
Jeremy Pinto
Anaum Khan
Louis-Franccois Bouchard
Chenglei Si
Svetlina Anati
Valen Tagliabue
Anson Liu Kost
Christopher Carnahan
Jordan L. Boyd-Graber
SILM
37
41
0
24 Oct 2023
AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large Language Models
Sicheng Zhu
Ruiyi Zhang
Bang An
Gang Wu
Joe Barrow
Zichao Wang
Furong Huang
A. Nenkova
Tong Sun
SILM
AAML
30
40
0
23 Oct 2023
An LLM can Fool Itself: A Prompt-Based Adversarial Attack
Xilie Xu
Keyi Kong
Ning Liu
Li-zhen Cui
Di Wang
Jingfeng Zhang
Mohan S. Kankanhalli
AAML
SILM
30
68
0
20 Oct 2023
Formalizing and Benchmarking Prompt Injection Attacks and Defenses
Yupei Liu
Yuqi Jia
Runpeng Geng
Jinyuan Jia
Neil Zhenqiang Gong
SILM
LLMAG
27
62
0
19 Oct 2023
Attack Prompt Generation for Red Teaming and Defending Large Language Models
Boyi Deng
Wenjie Wang
Fuli Feng
Yang Deng
Qifan Wang
Xiangnan He
AAML
25
48
0
19 Oct 2023
Privacy in Large Language Models: Attacks, Defenses and Future Directions
Haoran Li
Yulin Chen
Jinglong Luo
Yan Kang
Xiaojin Zhang
Qi Hu
Chunkit Chan
Yangqiu Song
PILM
48
42
0
16 Oct 2023
Prompt Packer: Deceiving LLMs through Compositional Instruction with Hidden Attacks
Shuyu Jiang
Xingshu Chen
Rui Tang
24
22
0
16 Oct 2023
ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models
Alex Mei
Sharon Levy
William Yang Wang
AAML
36
7
0
14 Oct 2023
CarExpert: Leveraging Large Language Models for In-Car Conversational Question Answering
Md. Rony
Christian Suess
Sinchana Ramakanth Bhat
Viju Sudhi
Julia Schneider
Maximilian Vogel
Roman Teucher
Ken E. Friedl
S. Sahoo
34
9
0
14 Oct 2023
Ask Again, Then Fail: Large Language Models' Vacillations in Judgment
Qiming Xie
Zengzhi Wang
Yi Feng
Rui Xia
AAML
HILM
35
9
0
03 Oct 2023
Building Privacy-Preserving and Secure Geospatial Artificial Intelligence Foundation Models
Jinmeng Rao
Song Gao
Gengchen Mai
Joanna M. Wardlaw
32
20
0
29 Sep 2023
Goal-Oriented Prompt Attack and Safety Evaluation for LLMs
Chengyuan Liu
Fubang Zhao
Lizhi Qing
Yangyang Kang
Changlong Sun
Kun Kuang
Fei Wu
AAML
26
16
0
21 Sep 2023
LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI's ChatGPT Plugins
Umar Iqbal
Tadayoshi Kohno
Franziska Roesner
ELM
SILM
74
48
0
19 Sep 2023
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
117
301
0
19 Sep 2023
ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing
Ian Arawjo
Chelse Swoopes
Priyan Vaithilingam
Martin Wattenberg
Elena L. Glassman
LLMAG
LRM
35
96
0
17 Sep 2023
Baseline Defenses for Adversarial Attacks Against Aligned Language Models
Neel Jain
Avi Schwarzschild
Yuxin Wen
Gowthami Somepalli
John Kirchenbauer
Ping Yeh-Chiang
Micah Goldblum
Aniruddha Saha
Jonas Geiping
Tom Goldstein
AAML
60
340
0
01 Sep 2023
Image Hijacks: Adversarial Images can Control Generative Models at Runtime
Luke Bailey
Euan Ong
Stuart J. Russell
Scott Emmons
VLM
MLLM
30
79
0
01 Sep 2023
A Comprehensive Overview of Backdoor Attacks in Large Language Models within Communication Networks
Haomiao Yang
Kunlan Xiang
Mengyu Ge
Hongwei Li
Rongxing Lu
Shui Yu
SILM
30
42
0
28 Aug 2023
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities
Maximilian Mozes
Xuanli He
Bennett Kleinberg
Lewis D. Griffin
39
78
0
24 Aug 2023
Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions
Pouya Pezeshkpour
Estevam R. Hruschka
LRM
20
126
0
22 Aug 2023
Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection
Zekun Li
Baolin Peng
Pengcheng He
Xifeng Yan
ELM
SILM
AAML
41
23
0
17 Aug 2023
Self-Deception: Reverse Penetrating the Semantic Firewall of Large Language Models
Zhenhua Wang
Wei Xie
Kai Chen
Baosheng Wang
Zhiwen Gui
Enze Wang
AAML
SILM
27
6
0
16 Aug 2023
GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher
Youliang Yuan
Wenxiang Jiao
Wenxuan Wang
Jen-tse Huang
Pinjia He
Shuming Shi
Zhaopeng Tu
SILM
76
232
0
12 Aug 2023
"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
Xinyue Shen
Z. Chen
Michael Backes
Yun Shen
Yang Zhang
SILM
40
245
0
07 Aug 2023
From Military to Healthcare: Adopting and Expanding Ethical Principles for Generative Artificial Intelligence
David Oniani
Jordan Hilsman
Yifan Peng
COL
C. R. K. Poropatich
C. J. C. Pamplin
L. G. L. Legault
Yanshan Wang
AI4TS
32
11
0
04 Aug 2023
From Prompt Injections to SQL Injection Attacks: How Protected is Your LLM-Integrated Web Application?
Rodrigo Pedro
Daniel Castro
Paulo Carreira
Nuno Santos
SILM
AAML
38
50
0
03 Aug 2023
Is GPT-4 a reliable rater? Evaluating Consistency in GPT-4 Text Ratings
Veronika Hackl
Alexandra Elena Müller
Michael Granitzer
Maximilian Sailer
ALM
15
44
0
03 Aug 2023
Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection
Jun Yan
Vikas Yadav
Shiyang Li
Lichang Chen
Zheng Tang
Hai Wang
Vijay Srinivasan
Xiang Ren
Hongxia Jin
SILM
25
75
0
31 Jul 2023
On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and Outlook
Mingyuan Fan
Chengyu Wang
Cen Chen
Yang Liu
Jun Huang
HILM
39
3
0
31 Jul 2023
Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models
Erfan Shayegani
Yue Dong
Nael B. Abu-Ghazaleh
47
127
0
26 Jul 2023
Embedding Democratic Values into Social Media AIs via Societal Objective Functions
Chenyan Jia
Michelle S. Lam
Minh Chau Mai
Jeffrey T. Hancock
Michael S. Bernstein
21
27
0
26 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
Fahad Shahbaz Khan
VLM
38
118
0
25 Jul 2023
Previous
1
2
3
4
5
6
Next