Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.14189
Cited By
Efficient Universal Goal Hijacking with Semantics-guided Prompt Organization
23 May 2024
Yihao Huang
Chong Wang
Xiaojun Jia
Qing Guo
Felix Juefei Xu
Jian Zhang
G. Pu
Yang Liu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Efficient Universal Goal Hijacking with Semantics-guided Prompt Organization"
33 / 33 papers shown
Title
Defense Against Prompt Injection Attack by Leveraging Attack Techniques
Yulin Chen
Haoran Li
Zihao Zheng
Yangqiu Song
Dekai Wu
Bryan Hooi
SILM
AAML
70
5
0
01 Nov 2024
Automatic and Universal Prompt Injection Attacks against Large Language Models
Xiaogeng Liu
Zhiyuan Yu
Yizhe Zhang
Ning Zhang
Chaowei Xiao
SILM
AAML
51
40
0
07 Mar 2024
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
Zhichen Dong
Zhanhui Zhou
Chao Yang
Jing Shao
Yu Qiao
ELM
70
58
0
14 Feb 2024
Hijacking Context in Large Multi-modal Models
Joonhyun Jeong
MLLM
68
6
0
07 Dec 2023
Hijacking Large Language Models via Adversarial In-Context Learning
Yao Qiang
Xiangyu Zhou
Saleh Zare Zade
Prashant Khanduri
Dongxiao Zhu
71
34
0
16 Nov 2023
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
Sam Toyer
Olivia Watkins
Ethan Mendes
Justin Svegliato
Luke Bailey
...
Karim Elmaaroufi
Pieter Abbeel
Trevor Darrell
Alan Ritter
Stuart J. Russell
48
74
0
02 Nov 2023
Formalizing and Benchmarking Prompt Injection Attacks and Defenses
Yupei Liu
Yuqi Jia
Runpeng Geng
Jinyuan Jia
Neil Zhenqiang Gong
SILM
LLMAG
39
71
0
19 Oct 2023
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities
Maximilian Mozes
Xuanli He
Bennett Kleinberg
Lewis D. Griffin
44
81
0
24 Aug 2023
Protect Federated Learning Against Backdoor Attacks via Data-Free Trigger Generation
Yanxin Yang
Ming Hu
Yue Cao
Jun Xia
Yihao Huang
Yang Liu
Mingsong Chen
FedML
67
6
0
22 Aug 2023
"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
Xinyue Shen
Zhenpeng Chen
Michael Backes
Yun Shen
Yang Zhang
SILM
73
264
0
07 Aug 2023
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
141
1,376
0
27 Jul 2023
Prompt Injection attack against LLM-integrated Applications
Yi Liu
Gelei Deng
Yuekang Li
Kailong Wang
Zihao Wang
...
Tianwei Zhang
Yepang Liu
Haoyu Wang
Yanhong Zheng
Yang Liu
SILM
77
341
0
08 Jun 2023
Personalization as a Shortcut for Few-Shot Backdoor Attack against Text-to-Image Diffusion Models
Yihao Huang
Felix Juefei Xu
Qing Guo
Jie M. Zhang
Yutong Wu
Ming Hu
Tianlin Li
Geguang Pu
Yang Liu
DiffM
72
33
0
18 May 2023
OpenAssistant Conversations -- Democratizing Large Language Model Alignment
Andreas Kopf
Yannic Kilcher
Dimitri von Rutte
Sotiris Anagnostidis
Zhi Rui Tam
...
Arnav Dantuluri
Andrew Maguire
Christoph Schuhmann
Huu Nguyen
A. Mattick
ALM
LM&MA
69
611
0
14 Apr 2023
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Kai Greshake
Sahar Abdelnabi
Shailesh Mishra
C. Endres
Thorsten Holz
Mario Fritz
SILM
90
462
0
23 Feb 2023
Large Language Models Can Be Easily Distracted by Irrelevant Context
Freda Shi
Xinyun Chen
Kanishka Misra
Nathan Scales
David Dohan
Ed H. Chi
Nathanael Scharli
Denny Zhou
ReLM
RALM
LRM
65
564
0
31 Jan 2023
Ignore Previous Prompt: Attack Techniques For Language Models
Fábio Perez
Ian Ribeiro
SILM
69
420
0
17 Nov 2022
Evaluating the Susceptibility of Pre-Trained Language Models via Handcrafted Adversarial Examples
Hezekiah J. Branch
Jonathan Rodriguez Cefalu
Jeremy McHugh
Leyla Hujer
Aditya Bahl
Daniel del Castillo Iglesias
Ron Heichman
Ramesh Darwishi
ELM
SILM
AAML
22
51
0
05 Sep 2022
Solving Quantitative Reasoning Problems with Language Models
Aitor Lewkowycz
Anders Andreassen
David Dohan
Ethan Dyer
Henryk Michalewski
...
Theo Gutman-Solo
Yuhuai Wu
Behnam Neyshabur
Guy Gur-Ari
Vedant Misra
ReLM
ELM
LRM
114
793
0
29 Jun 2022
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
181
2,457
0
12 Apr 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
638
12,525
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
533
9,009
0
28 Jan 2022
Design Guidelines for Prompt Engineering Text-to-Image Generative Models
Vivian Liu
Lydia B. Chilton
37
482
0
14 Sep 2021
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Stephanie C. Lin
Jacob Hilton
Owain Evans
HILM
85
1,825
0
08 Sep 2021
Finetuned Language Models Are Zero-Shot Learners
Jason W. Wei
Maarten Bosma
Vincent Zhao
Kelvin Guu
Adams Wei Yu
Brian Lester
Nan Du
Andrew M. Dai
Quoc V. Le
ALM
UQCV
53
3,678
0
03 Sep 2021
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
104
1,168
0
24 Sep 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
388
41,106
0
28 May 2020
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
405
1,664
0
18 Sep 2019
Universal Adversarial Triggers for Attacking and Analyzing NLP
Eric Wallace
Shi Feng
Nikhil Kandpal
Matt Gardner
Sameer Singh
AAML
SILM
80
856
0
20 Aug 2019
Towards Deep Learning Models Resistant to Adversarial Attacks
Aleksander Madry
Aleksandar Makelov
Ludwig Schmidt
Dimitris Tsipras
Adrian Vladu
SILM
OOD
193
11,962
0
19 Jun 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
319
129,831
0
12 Jun 2017
Universal adversarial perturbations
Seyed-Mohsen Moosavi-Dezfooli
Alhussein Fawzi
Omar Fawzi
P. Frossard
AAML
105
2,520
0
26 Oct 2016
Explaining and Harnessing Adversarial Examples
Ian Goodfellow
Jonathon Shlens
Christian Szegedy
AAML
GAN
130
18,922
0
20 Dec 2014
1