Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.20778
Cited By
Improved Generation of Adversarial Examples Against Safety-aligned LLMs
28 May 2024
Qizhang Li
Yiwen Guo
Wangmeng Zuo
Hao Chen
AAML
SILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Improved Generation of Adversarial Examples Against Safety-aligned LLMs"
22 / 22 papers shown
Title
Deciphering the Chaos: Enhancing Jailbreak Attacks via Adversarial Prompt Translation
Qizhang Li
Xiaochen Yang
W. Zuo
Yiwen Guo
AAML
113
1
0
15 Oct 2024
PAL: Proxy-Guided Black-Box Attack on Large Language Models
Chawin Sitawarin
Norman Mu
David Wagner
Alexandre Araujo
ELM
49
34
0
15 Feb 2024
How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs
Yi Zeng
Hongpeng Lin
Jingwen Zhang
Diyi Yang
Ruoxi Jia
Weiyan Shi
78
301
0
12 Jan 2024
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Asma Ghandeharioun
Avi Caciularu
Adam Pearce
Lucas Dixon
Mor Geva
88
113
0
11 Jan 2024
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
282
1,449
0
27 Jul 2023
Making Substitute Models More Bayesian Can Enhance Transferability of Adversarial Examples
Qizhang Li
Yiwen Guo
W. Zuo
Hao Chen
AAML
73
37
0
10 Feb 2023
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
...
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
SyDa
MoMe
171
1,611
0
15 Dec 2022
Mass-Editing Memory in a Transformer
Kevin Meng
Arnab Sen Sharma
A. Andonian
Yonatan Belinkov
David Bau
KELM
VLM
105
583
0
13 Oct 2022
LGV: Boosting Adversarial Example Transferability from Large Geometric Vicinity
Martin Gubri
Maxime Cordy
Mike Papadakis
Yves Le Traon
Koushik Sen
AAML
61
53
0
26 Jul 2022
Improving Adversarial Transferability via Neuron Attribution-Based Attacks
Jianping Zhang
Weibin Wu
Jen-tse Huang
Yizhan Huang
Wenxuan Wang
Yuxin Su
Michael R. Lyu
AAML
71
134
0
31 Mar 2022
Locating and Editing Factual Associations in GPT
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
222
1,344
0
10 Feb 2022
Red Teaming Language Models with Language Models
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
131
651
0
07 Feb 2022
Towards Transferable Adversarial Attacks on Vision Transformers
Zhipeng Wei
Jingjing Chen
Micah Goldblum
Zuxuan Wu
Tom Goldstein
Yu-Gang Jiang
ViT
AAML
76
119
0
09 Sep 2021
Feature Importance-aware Transferable Adversarial Attacks
Peng Kuang
Hengchang Guo
Zhifei Zhang
Wenxin Liu
Zhan Qin
K. Ren
AAML
64
213
0
29 Jul 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
514
4,021
0
18 Apr 2021
Backpropagating Linearly Improves Transferability of Adversarial Examples
Yiwen Guo
Qizhang Li
Hao Chen
FedML
AAML
65
116
0
07 Dec 2020
Yet Another Intermediate-Level Attack
Qizhang Li
Yiwen Guo
Hao Chen
AAML
48
51
0
20 Aug 2020
Enhancing Adversarial Example Transferability with an Intermediate Level Attack
Qian Huang
Isay Katsman
Horace He
Zeqi Gu
Serge J. Belongie
Ser-Nam Lim
SILM
AAML
77
246
0
23 Jul 2019
Improving Transferability of Adversarial Examples with Input Diversity
Cihang Xie
Zhishuai Zhang
Yuyin Zhou
Song Bai
Jianyu Wang
Zhou Ren
Alan Yuille
AAML
97
1,116
0
19 Mar 2018
Delving into Transferable Adversarial Examples and Black-box Attacks
Yanpei Liu
Xinyun Chen
Chang-rui Liu
D. Song
AAML
133
1,737
0
08 Nov 2016
Adversarial Machine Learning at Scale
Alexey Kurakin
Ian Goodfellow
Samy Bengio
AAML
461
3,138
0
04 Nov 2016
Intriguing properties of neural networks
Christian Szegedy
Wojciech Zaremba
Ilya Sutskever
Joan Bruna
D. Erhan
Ian Goodfellow
Rob Fergus
AAML
253
14,912
1
21 Dec 2013
1