Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.14461
Cited By
v1
v2 (latest)
Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs
22 April 2024
Javier Rando
Francesco Croce
Kryvstof Mitka
Stepan Shabalin
Maksym Andriushchenko
Nicolas Flammarion
F. Tramèr
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs"
9 / 9 papers shown
Title
AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment
Pankayaraj Pathmanathan
Udari Madhushani Sehwag
Michael-Andrei Panaitescu-Liess
Furong Huang
SILM
AAML
82
0
0
15 Oct 2024
Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits
Andis Draguns
Andrew Gritsevskiy
S. Motwani
Charlie Rogers-Smith
Jeffrey Ladish
Christian Schroeder de Witt
126
2
0
03 Jun 2024
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Maksym Andriushchenko
Francesco Croce
Nicolas Flammarion
AAML
155
212
0
02 Apr 2024
Rapid Optimization for Jailbreaking LLMs via Subconscious Exploitation and Echopraxia
Guangyu Shen
Shuyang Cheng
Kai-xian Zhang
Guanhong Tao
Shengwei An
Lu Yan
Zhuo Zhang
Shiqing Ma
Xiangyu Zhang
55
15
0
08 Feb 2024
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
291
1,498
0
27 Jul 2023
Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models
Wenkai Yang
Lei Li
Zhiyuan Zhang
Xuancheng Ren
Xu Sun
Bin He
SILM
89
153
0
29 Mar 2021
Weight Poisoning Attacks on Pre-trained Models
Keita Kurita
Paul Michel
Graham Neubig
AAML
SILM
134
451
0
14 Apr 2020
Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning
Xinyun Chen
Chang-rui Liu
Yue Liu
Kimberly Lu
Basel Alomair
AAML
SILM
143
1,840
0
15 Dec 2017
Poisoning Attacks against Support Vector Machines
Battista Biggio
B. Nelson
Pavel Laskov
AAML
115
1,593
0
27 Jun 2012
1