ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.20778
  4. Cited By
Improved Generation of Adversarial Examples Against Safety-aligned LLMs

Improved Generation of Adversarial Examples Against Safety-aligned LLMs

28 May 2024
Qizhang Li
Yiwen Guo
Wangmeng Zuo
Hao Chen
    AAML
    SILM
ArXivPDFHTML

Papers citing "Improved Generation of Adversarial Examples Against Safety-aligned LLMs"

22 / 22 papers shown
Title
Deciphering the Chaos: Enhancing Jailbreak Attacks via Adversarial Prompt Translation
Deciphering the Chaos: Enhancing Jailbreak Attacks via Adversarial Prompt Translation
Qizhang Li
Xiaochen Yang
W. Zuo
Yiwen Guo
AAML
113
1
0
15 Oct 2024
PAL: Proxy-Guided Black-Box Attack on Large Language Models
PAL: Proxy-Guided Black-Box Attack on Large Language Models
Chawin Sitawarin
Norman Mu
David Wagner
Alexandre Araujo
ELM
49
34
0
15 Feb 2024
How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to
  Challenge AI Safety by Humanizing LLMs
How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs
Yi Zeng
Hongpeng Lin
Jingwen Zhang
Diyi Yang
Ruoxi Jia
Weiyan Shi
78
301
0
12 Jan 2024
Patchscopes: A Unifying Framework for Inspecting Hidden Representations
  of Language Models
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Asma Ghandeharioun
Avi Caciularu
Adam Pearce
Lucas Dixon
Mor Geva
88
113
0
11 Jan 2024
Universal and Transferable Adversarial Attacks on Aligned Language
  Models
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
282
1,449
0
27 Jul 2023
Making Substitute Models More Bayesian Can Enhance Transferability of
  Adversarial Examples
Making Substitute Models More Bayesian Can Enhance Transferability of Adversarial Examples
Qizhang Li
Yiwen Guo
W. Zuo
Hao Chen
AAML
73
37
0
10 Feb 2023
Constitutional AI: Harmlessness from AI Feedback
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
...
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
SyDa
MoMe
171
1,611
0
15 Dec 2022
Mass-Editing Memory in a Transformer
Mass-Editing Memory in a Transformer
Kevin Meng
Arnab Sen Sharma
A. Andonian
Yonatan Belinkov
David Bau
KELM
VLM
105
583
0
13 Oct 2022
LGV: Boosting Adversarial Example Transferability from Large Geometric
  Vicinity
LGV: Boosting Adversarial Example Transferability from Large Geometric Vicinity
Martin Gubri
Maxime Cordy
Mike Papadakis
Yves Le Traon
Koushik Sen
AAML
61
53
0
26 Jul 2022
Improving Adversarial Transferability via Neuron Attribution-Based
  Attacks
Improving Adversarial Transferability via Neuron Attribution-Based Attacks
Jianping Zhang
Weibin Wu
Jen-tse Huang
Yizhan Huang
Wenxuan Wang
Yuxin Su
Michael R. Lyu
AAML
71
134
0
31 Mar 2022
Locating and Editing Factual Associations in GPT
Locating and Editing Factual Associations in GPT
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
222
1,344
0
10 Feb 2022
Red Teaming Language Models with Language Models
Red Teaming Language Models with Language Models
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
131
651
0
07 Feb 2022
Towards Transferable Adversarial Attacks on Vision Transformers
Towards Transferable Adversarial Attacks on Vision Transformers
Zhipeng Wei
Jingjing Chen
Micah Goldblum
Zuxuan Wu
Tom Goldstein
Yu-Gang Jiang
ViT
AAML
76
119
0
09 Sep 2021
Feature Importance-aware Transferable Adversarial Attacks
Feature Importance-aware Transferable Adversarial Attacks
Peng Kuang
Hengchang Guo
Zhifei Zhang
Wenxin Liu
Zhan Qin
K. Ren
AAML
64
213
0
29 Jul 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
514
4,021
0
18 Apr 2021
Backpropagating Linearly Improves Transferability of Adversarial
  Examples
Backpropagating Linearly Improves Transferability of Adversarial Examples
Yiwen Guo
Qizhang Li
Hao Chen
FedML
AAML
65
116
0
07 Dec 2020
Yet Another Intermediate-Level Attack
Yet Another Intermediate-Level Attack
Qizhang Li
Yiwen Guo
Hao Chen
AAML
48
51
0
20 Aug 2020
Enhancing Adversarial Example Transferability with an Intermediate Level
  Attack
Enhancing Adversarial Example Transferability with an Intermediate Level Attack
Qian Huang
Isay Katsman
Horace He
Zeqi Gu
Serge J. Belongie
Ser-Nam Lim
SILM
AAML
77
246
0
23 Jul 2019
Improving Transferability of Adversarial Examples with Input Diversity
Improving Transferability of Adversarial Examples with Input Diversity
Cihang Xie
Zhishuai Zhang
Yuyin Zhou
Song Bai
Jianyu Wang
Zhou Ren
Alan Yuille
AAML
97
1,116
0
19 Mar 2018
Delving into Transferable Adversarial Examples and Black-box Attacks
Delving into Transferable Adversarial Examples and Black-box Attacks
Yanpei Liu
Xinyun Chen
Chang-rui Liu
D. Song
AAML
133
1,737
0
08 Nov 2016
Adversarial Machine Learning at Scale
Adversarial Machine Learning at Scale
Alexey Kurakin
Ian Goodfellow
Samy Bengio
AAML
461
3,138
0
04 Nov 2016
Intriguing properties of neural networks
Intriguing properties of neural networks
Christian Szegedy
Wojciech Zaremba
Ilya Sutskever
Joan Bruna
D. Erhan
Ian Goodfellow
Rob Fergus
AAML
253
14,912
1
21 Dec 2013
1