Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.09527
Cited By
Ignore Previous Prompt: Attack Techniques For Language Models
17 November 2022
Fábio Perez
Ian Ribeiro
SILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Ignore Previous Prompt: Attack Techniques For Language Models"
50 / 284 papers shown
Title
Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models?
Mohammad Bahrami Karkevandi
Nishant Vishwamitra
Peyman Najafirad
AAML
43
1
0
05 Aug 2024
Defining and Evaluating Decision and Composite Risk in Language Models Applied to Natural Language Inference
Ke Shen
Mayank Kejriwal
40
0
0
04 Aug 2024
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
Jingtong Su
Mingyu Lee
SangKeun Lee
43
8
0
02 Aug 2024
Can LLMs be Fooled? Investigating Vulnerabilities in LLMs
Sara Abdali
Jia He
C. Barberan
Richard Anarfi
36
7
0
30 Jul 2024
LLMs can be Dangerous Reasoners: Analyzing-based Jailbreak Attack on Large Language Models
Shi Lin
Rongchang Li
Xun Wang
Changting Lin
Xun Wang
Wenpeng Xing
Meng Han
Meng Han
63
3
0
23 Jul 2024
LLMmap: Fingerprinting For Large Language Models
Dario Pasquini
Evgenios M. Kornaropoulos
G. Ateniese
55
6
0
22 Jul 2024
Counterfactual Explainable Incremental Prompt Attack Analysis on Large Language Models
Dong Shu
Mingyu Jin
Tianle Chen
Chong Zhang
Yongfeng Zhang
ELM
SILM
34
1
0
12 Jul 2024
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
Youliang Yuan
Wenxiang Jiao
Wenxuan Wang
Jen-tse Huang
Jiahao Xu
Tian Liang
Pinjia He
Zhaopeng Tu
45
19
0
12 Jul 2024
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Riccardo Cantini
Giada Cosenza
A. Orsino
Domenico Talia
AAML
57
5
0
11 Jul 2024
Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course
Cheng-Han Chiang
Wei-Chih Chen
Chun-Yi Kuan
Chienchou Yang
Hung-yi Lee
ELM
AI4Ed
43
5
0
07 Jul 2024
Soft Begging: Modular and Efficient Shielding of LLMs against Prompt Injection and Jailbreaking based on Prompt Tuning
Simon Ostermann
Kevin Baum
Christoph Endres
Julia Masloh
P. Schramowski
AAML
54
1
0
03 Jul 2024
JailbreakHunter: A Visual Analytics Approach for Jailbreak Prompts Discovery from Large-Scale Human-LLM Conversational Datasets
Zhihua Jin
Shiyi Liu
Haotian Li
Xun Zhao
Huamin Qu
50
3
0
03 Jul 2024
LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models
Hayder Elesedy
Pedro M. Esperança
Silviu Vlad Oprea
Mete Ozay
KELM
36
2
0
03 Jul 2024
Whispering Experts: Neural Interventions for Toxicity Mitigation in Language Models
Xavier Suau
Pieter Delobelle
Katherine Metcalf
Armand Joulin
N. Apostoloff
Luca Zappella
P. Rodríguez
MU
AAML
42
8
0
02 Jul 2024
Enhancing the Capability and Robustness of Large Language Models through Reinforcement Learning-Driven Query Refinement
Zisu Huang
Xiaohua Wang
Feiran Zhang
Zhibo Xu
Cenyuan Zhang
Xiaoqing Zheng
Xuanjing Huang
AAML
LRM
40
4
0
01 Jul 2024
Monitoring Latent World States in Language Models with Propositional Probes
Jiahai Feng
Stuart Russell
Jacob Steinhardt
HILM
46
6
0
27 Jun 2024
Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers
Yibo Jiang
Goutham Rajendran
Pradeep Ravikumar
Bryon Aragam
CLL
KELM
37
6
0
26 Jun 2024
Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data
Nahema Marchal
Rachel Xu
Rasmi Elasmar
Iason Gabriel
Beth Goldberg
William S. Isaac
LLMAG
29
13
0
19 Jun 2024
AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents
Edoardo Debenedetti
Jie Zhang
Mislav Balunović
Luca Beurer-Kellner
Marc Fischer
Florian Tramèr
LLMAG
AAML
56
26
1
19 Jun 2024
Supporting Human Raters with the Detection of Harmful Content using Large Language Models
Kurt Thomas
Patrick Gage Kelley
David Tao
Sarah Meiklejohn
Owen Vallis
Shunwen Tan
Blaz Bratanic
Felipe Tiengo Ferreira
Vijay Eranti
Elie Bursztein
46
2
0
18 Jun 2024
garak: A Framework for Security Probing Large Language Models
Leon Derczynski
Erick Galinkin
Jeffrey Martin
Subho Majumdar
Nanna Inie
AAML
ELM
38
16
0
16 Jun 2024
Threat Modelling and Risk Analysis for Large Language Model (LLM)-Powered Applications
Stephen Burabari Tete
42
7
0
16 Jun 2024
Security of AI Agents
Yifeng He
Ethan Wang
Yuyang Rong
Zifei Cheng
Hao Chen
LLMAG
42
7
0
12 Jun 2024
Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey
Shang Wang
Tianqing Zhu
Bo Liu
Ming Ding
Xu Guo
Dayong Ye
Wanlei Zhou
Philip S. Yu
PILM
67
17
0
12 Jun 2024
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Edoardo Debenedetti
Javier Rando
Daniel Paleka
Silaghi Fineas Florin
Dragos Albastroiu
...
Stefan Kraft
Mario Fritz
Florian Tramèr
Sahar Abdelnabi
Lea Schonherr
59
10
0
12 Jun 2024
Evaluating Contextually Personalized Programming Exercises Created with Generative AI
E. Logacheva
Arto Hellas
James Prather
Sami Sarsa
Juho Leinonen
37
10
0
11 Jun 2024
Raccoon: Prompt Extraction Benchmark of LLM-Integrated Applications
Junlin Wang
Tianyi Yang
Roy Xie
Bhuwan Dhingra
SILM
AAML
36
4
0
10 Jun 2024
Machine Against the RAG: Jamming Retrieval-Augmented Generation with Blocker Documents
Avital Shafran
R. Schuster
Vitaly Shmatikov
46
27
0
09 Jun 2024
Assessing LLMs for Zero-shot Abstractive Summarization Through the Lens of Relevance Paraphrasing
Hadi Askari
Anshuman Chhabra
Muhao Chen
Prasant Mohapatra
35
5
0
06 Jun 2024
Defending Large Language Models Against Attacks With Residual Stream Activation Analysis
Amelia Kawasaki
Andrew Davis
Houssam Abbas
AAML
KELM
32
2
0
05 Jun 2024
AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways
Zehang Deng
Yongjian Guo
Changzhou Han
Wanlun Ma
Junwu Xiong
Sheng Wen
Yang Xiang
44
23
0
04 Jun 2024
Safeguarding Large Language Models: A Survey
Yi Dong
Ronghui Mu
Yanghao Zhang
Siqi Sun
Tianle Zhang
...
Yi Qi
Jinwei Hu
Jie Meng
Saddek Bensalem
Xiaowei Huang
OffRL
KELM
AILaw
37
19
0
03 Jun 2024
PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration
Ziqian Zeng
Jianwei Wang
Zhengdong Lu
Huiping Zhuang
Cen Chen
RALM
KELM
48
7
0
03 Jun 2024
BELLS: A Framework Towards Future Proof Benchmarks for the Evaluation of LLM Safeguards
Diego Dorn
Alexandre Variengien
Charbel-Raphaël Ségerie
Vincent Corruble
29
7
0
03 Jun 2024
Exfiltration of personal information from ChatGPT via prompt injection
Gregory Schwartzman
SILM
19
1
0
31 May 2024
Voice Jailbreak Attacks Against GPT-4o
Xinyue Shen
Yixin Wu
Michael Backes
Yang Zhang
AuLLM
40
9
0
29 May 2024
Semantic-guided Prompt Organization for Universal Goal Hijacking against LLMs
Yihao Huang
Chong Wang
Xiaojun Jia
Qing-Wu Guo
Felix Juefei Xu
Jian Zhang
G. Pu
Yang Liu
36
9
0
23 May 2024
Tiny Refinements Elicit Resilience: Toward Efficient Prefix-Model Against LLM Red-Teaming
Jiaxu Liu
Xiangyu Yin
Sihao Wu
Jianhong Wang
Meng Fang
Xinping Yi
Xiaowei Huang
34
4
0
21 May 2024
PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition
Ziyang Zhang
Qizhen Zhang
Jakob N. Foerster
AAML
40
18
0
13 May 2024
PLeak: Prompt Leaking Attacks against Large Language Model Applications
Bo Hui
Haolin Yuan
Neil Gong
Philippe Burlina
Yinzhi Cao
LLMAG
AAML
SILM
36
34
0
10 May 2024
Locally Differentially Private In-Context Learning
Chunyan Zheng
Keke Sun
Wenhao Zhao
Haibo Zhou
Lixin Jiang
Shaoyang Song
Chunlai Zhou
42
2
0
07 May 2024
From Persona to Personalization: A Survey on Role-Playing Language Agents
Jiangjie Chen
Xintao Wang
Rui Xu
Siyu Yuan
Yikai Zhang
...
Caiyu Hu
Siye Wu
Scott Ren
Ziquan Fu
Yanghua Xiao
62
77
0
28 Apr 2024
Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs
Valeriia Cherepanova
James Zou
AAML
33
4
0
26 Apr 2024
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
Anselm Paulus
Arman Zharmagambetov
Chuan Guo
Brandon Amos
Yuandong Tian
AAML
55
56
0
21 Apr 2024
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Eric Wallace
Kai Y. Xiao
R. Leike
Lilian Weng
Johannes Heidecke
Alex Beutel
SILM
58
117
0
19 Apr 2024
Offset Unlearning for Large Language Models
James Y. Huang
Wenxuan Zhou
Fei Wang
Fred Morstatter
Sheng Zhang
Hoifung Poon
Muhao Chen
MU
30
14
0
17 Apr 2024
JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models
Yingchaojie Feng
Zhizhang Chen
Zhining Kang
Sijia Wang
Minfeng Zhu
Wei Zhang
Wei Chen
42
3
0
12 Apr 2024
GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications
Shishir G. Patil
Tianjun Zhang
Vivian Fang
Noppapon C Roy Huang
Uc Berkeley
Aaron Hao
Martin Casado
Joseph E. Gonzalez Raluca
Ada Popa
Ion Stoica
ALM
34
10
0
10 Apr 2024
Sandwich attack: Multi-language Mixture Adaptive Attack on LLMs
Bibek Upadhayay
Vahid Behzadan
AAML
26
13
0
09 Apr 2024
Goal-guided Generative Prompt Injection Attack on Large Language Models
Chong Zhang
Mingyu Jin
Qinkai Yu
Chengzhi Liu
Haochen Xue
Xiaobo Jin
AAML
SILM
42
10
0
06 Apr 2024
Previous
1
2
3
4
5
6
Next