ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.15043
  4. Cited By
Universal and Transferable Adversarial Attacks on Aligned Language
  Models
v1v2 (latest)

Universal and Transferable Adversarial Attacks on Aligned Language Models

27 July 2023
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
ArXiv (abs)PDFHTMLGithub (3937★)

Papers citing "Universal and Transferable Adversarial Attacks on Aligned Language Models"

50 / 1,101 papers shown
Title
Large Language Models for Conducting Advanced Text Analytics Information
  Systems Research
Large Language Models for Conducting Advanced Text Analytics Information Systems Research
Benjamin Ampel
Chi-Heng Yang
Junjie Hu
Hsinchun Chen
118
8
0
27 Dec 2023
Alleviating Hallucinations of Large Language Models through Induced
  Hallucinations
Alleviating Hallucinations of Large Language Models through Induced Hallucinations
Yue Zhang
Leyang Cui
Wei Bi
Shuming Shi
HILM
108
57
0
25 Dec 2023
MetaAID 2.5: A Secure Framework for Developing Metaverse Applications
  via Large Language Models
MetaAID 2.5: A Secure Framework for Developing Metaverse Applications via Large Language Models
Hongyin Zhu
80
6
0
22 Dec 2023
Asymmetric Bias in Text-to-Image Generation with Adversarial Attacks
Asymmetric Bias in Text-to-Image Generation with Adversarial Attacks
Haz Sameen Shahgir
Xianghao Kong
Greg Ver Steeg
Yue Dong
62
4
0
22 Dec 2023
Training Neural Networks with Internal State, Unconstrained
  Connectivity, and Discrete Activations
Training Neural Networks with Internal State, Unconstrained Connectivity, and Discrete Activations
Alexander Grushin
AI4CE
18
0
0
22 Dec 2023
Exploiting Novel GPT-4 APIs
Exploiting Novel GPT-4 APIs
Kellin Pelrine
Mohammad Taufeeque
Michal Zajkac
Euan McLean
Adam Gleave
SILM
62
21
0
21 Dec 2023
Bypassing the Safety Training of Open-Source LLMs with Priming Attacks
Bypassing the Safety Training of Open-Source LLMs with Priming Attacks
Jason Vega
Isha Chaudhary
Changming Xu
Gagandeep Singh
AAML
86
24
0
19 Dec 2023
Silent Guardian: Protecting Text from Malicious Exploitation by Large
  Language Models
Silent Guardian: Protecting Text from Malicious Exploitation by Large Language Models
Jiawei Zhao
Kejiang Chen
Xianjian Yuan
Yuang Qi
Weiming Zhang
Neng H. Yu
97
9
0
15 Dec 2023
The Earth is Flat because...: Investigating LLMs' Belief towards
  Misinformation via Persuasive Conversation
The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation
Rongwu Xu
Brian S. Lin
Shujian Yang
Tianqi Zhang
Weiyan Shi
Tianwei Zhang
Zhixuan Fang
Wei Xu
Han Qiu
160
61
0
14 Dec 2023
Exploring Transferability for Randomized Smoothing
Exploring Transferability for Randomized Smoothing
Kai Qiu
Huishuai Zhang
Zhirong Wu
Stephen Lin
AAML
50
1
0
14 Dec 2023
Causality Analysis for Evaluating the Security of Large Language Models
Causality Analysis for Evaluating the Security of Large Language Models
Wei Zhao
Zhe Li
Junfeng Sun
68
12
0
13 Dec 2023
Maatphor: Automated Variant Analysis for Prompt Injection Attacks
Maatphor: Automated Variant Analysis for Prompt Injection Attacks
Ahmed Salem
Andrew Paverd
Boris Köpf
171
10
0
12 Dec 2023
AI Control: Improving Safety Despite Intentional Subversion
AI Control: Improving Safety Despite Intentional Subversion
Ryan Greenblatt
Buck Shlegeris
Kshitij Sachan
Fabien Roger
101
54
0
12 Dec 2023
Safety Alignment in NLP Tasks: Weakly Aligned Summarization as an
  In-Context Attack
Safety Alignment in NLP Tasks: Weakly Aligned Summarization as an In-Context Attack
Yu Fu
Yufei Li
Wen Xiao
Cong Liu
Yue Dong
AAML
105
5
0
12 Dec 2023
METAL: Metamorphic Testing Framework for Analyzing Large-Language Model
  Qualities
METAL: Metamorphic Testing Framework for Analyzing Large-Language Model Qualities
Sangwon Hyun
Mingyu Guo
Muhammad Ali Babar
75
10
0
11 Dec 2023
NLLG Quarterly arXiv Report 09/23: What are the most influential current
  AI Papers?
NLLG Quarterly arXiv Report 09/23: What are the most influential current AI Papers?
Ran Zhang
Aida Kostikova
Christoph Leiter
Jonas Belouadi
Daniil Larionov
Yanran Chen
Vivian Fresen
Steffen Eger
81
0
0
09 Dec 2023
Make Them Spill the Beans! Coercive Knowledge Extraction from
  (Production) LLMs
Make Them Spill the Beans! Coercive Knowledge Extraction from (Production) LLMs
Zhuo Zhang
Guangyu Shen
Guanhong Tao
Shuyang Cheng
Xiangyu Zhang
103
14
0
08 Dec 2023
Forcing Generative Models to Degenerate Ones: The Power of Data
  Poisoning Attacks
Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks
Shuli Jiang
S. Kadhe
Yi Zhou
Ling Cai
Nathalie Baracaldo
SILMAAML
72
13
0
07 Dec 2023
Analyzing the Inherent Response Tendency of LLMs: Real-World
  Instructions-Driven Jailbreak
Analyzing the Inherent Response Tendency of LLMs: Real-World Instructions-Driven Jailbreak
Yanrui Du
Sendong Zhao
Ming Ma
Yuhan Chen
Bing Qin
71
16
0
07 Dec 2023
LLM as OS, Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent
  Ecosystem
LLM as OS, Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent Ecosystem
Yingqiang Ge
Yujie Ren
Wenyue Hua
Shuyuan Xu
Juntao Tan
Yongfeng Zhang
LLMAG
68
30
0
06 Dec 2023
On the Robustness of Large Multimodal Models Against Image Adversarial
  Attacks
On the Robustness of Large Multimodal Models Against Image Adversarial Attacks
Xuanimng Cui
Alejandro Aparcedo
Young Kyun Jang
Ser-Nam Lim
AAMLVLM
97
47
0
06 Dec 2023
Scaling Laws for Adversarial Attacks on Language Model Activations
Scaling Laws for Adversarial Attacks on Language Model Activations
Stanislav Fort
63
16
0
05 Dec 2023
Prompt Optimization via Adversarial In-Context Learning
Prompt Optimization via Adversarial In-Context Learning
Do Xuan Long
Yiran Zhao
Hannah Brown
Yuxi Xie
James Xu Zhao
Nancy F. Chen
Kenji Kawaguchi
Michael Qizhe Xie
Junxian He
152
16
0
05 Dec 2023
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Anay Mehrotra
Manolis Zampetakis
Paul Kassianik
Blaine Nelson
Hyrum Anderson
Yaron Singer
Amin Karbasi
95
272
0
04 Dec 2023
A Survey on Large Language Model (LLM) Security and Privacy: The Good,
  the Bad, and the Ugly
A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly
Yifan Yao
Jinhao Duan
Kaidi Xu
Yuanfang Cai
Eric Sun
Yue Zhang
PILMELM
125
561
0
04 Dec 2023
Distilled Self-Critique of LLMs with Synthetic Data: a Bayesian
  Perspective
Distilled Self-Critique of LLMs with Synthetic Data: a Bayesian Perspective
Víctor Gallego
41
4
0
04 Dec 2023
VA3: Virtually Assured Amplification Attack on Probabilistic Copyright
  Protection for Text-to-Image Generative Models
VA3: Virtually Assured Amplification Attack on Probabilistic Copyright Protection for Text-to-Image Generative Models
Xiang Li
Qianli Shen
Kenji Kawaguchi
74
5
0
29 Nov 2023
MMA-Diffusion: MultiModal Attack on Diffusion Models
MMA-Diffusion: MultiModal Attack on Diffusion Models
Yijun Yang
Ruiyuan Gao
Xiaosen Wang
Tsung-Yi Ho
Nan Xu
Qiang Xu
91
77
0
29 Nov 2023
How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for
  Vision LLMs
How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs
Haoqin Tu
Chenhang Cui
Zijun Wang
Yiyang Zhou
Bingchen Zhao
Junlin Han
Wangchunshu Zhou
Huaxiu Yao
Cihang Xie
MLLM
128
82
0
27 Nov 2023
Universal Jailbreak Backdoors from Poisoned Human Feedback
Universal Jailbreak Backdoors from Poisoned Human Feedback
Javier Rando
Florian Tramèr
121
75
0
24 Nov 2023
Transfer Attacks and Defenses for Large Language Models on Coding Tasks
Transfer Attacks and Defenses for Large Language Models on Coding Tasks
Chi Zhang
Zifan Wang
Ravi Mangal
Matt Fredrikson
Limin Jia
Corina S. Pasareanu
AAMLSILM
69
1
0
22 Nov 2023
Evil Geniuses: Delving into the Safety of LLM-based Agents
Evil Geniuses: Delving into the Safety of LLM-based Agents
Yu Tian
Xiao Yang
Jingyuan Zhang
Yinpeng Dong
Hang Su
LLMAGAAML
100
67
0
20 Nov 2023
Token-Level Adversarial Prompt Detection Based on Perplexity Measures
  and Contextual Information
Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information
Zhengmian Hu
Gang Wu
Saayan Mitra
Ruiyi Zhang
Tong Sun
Heng-Chiao Huang
Vishy Swaminathan
99
27
0
20 Nov 2023
Beyond Boundaries: A Comprehensive Survey of Transferable Attacks on AI Systems
Beyond Boundaries: A Comprehensive Survey of Transferable Attacks on AI Systems
Guangjing Wang
Ce Zhou
Yuanda Wang
Bocheng Chen
Hanqing Guo
Qiben Yan
AAMLSILM
137
3
0
20 Nov 2023
Cognitive Overload: Jailbreaking Large Language Models with Overloaded
  Logical Thinking
Cognitive Overload: Jailbreaking Large Language Models with Overloaded Logical Thinking
Nan Xu
Fei Wang
Ben Zhou
Bangzheng Li
Chaowei Xiao
Muhao Chen
111
60
0
16 Nov 2023
Automatic Engineering of Long Prompts
Automatic Engineering of Long Prompts
Cho-Jui Hsieh
Si Si
Felix X. Yu
Inderjit S. Dhillon
VLM
86
9
0
16 Nov 2023
Bergeron: Combating Adversarial Attacks through a Conscience-Based
  Alignment Framework
Bergeron: Combating Adversarial Attacks through a Conscience-Based Alignment Framework
Matthew Pisano
Peter Ly
Abraham Sanders
Bingsheng Yao
Dakuo Wang
T. Strzalkowski
Mei Si
AAML
66
26
0
16 Nov 2023
Hijacking Large Language Models via Adversarial In-Context Learning
Hijacking Large Language Models via Adversarial In-Context Learning
Yao Qiang
Xiangyu Zhou
Saleh Zare Zade
Prashant Khanduri
Dongxiao Zhu
116
35
0
16 Nov 2023
Stealthy and Persistent Unalignment on Large Language Models via
  Backdoor Injections
Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections
Yuanpu Cao
Bochuan Cao
Jinghui Chen
92
28
0
15 Nov 2023
How Trustworthy are Open-Source LLMs? An Assessment under Malicious
  Demonstrations Shows their Vulnerabilities
How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities
Lingbo Mo
Boshi Wang
Muhao Chen
Huan Sun
82
29
0
15 Nov 2023
Jailbreaking GPT-4V via Self-Adversarial Attacks with System Prompts
Jailbreaking GPT-4V via Self-Adversarial Attacks with System Prompts
Yuanwei Wu
Xiang Li
Yixin Liu
Pan Zhou
Lichao Sun
99
65
0
15 Nov 2023
Defending Large Language Models Against Jailbreaking Attacks Through
  Goal Prioritization
Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
Zhexin Zhang
Junxiao Yang
Pei Ke
Fei Mi
Hongning Wang
Minlie Huang
AAML
91
133
0
15 Nov 2023
Alignment is not sufficient to prevent large language models from
  generating harmful information: A psychoanalytic perspective
Alignment is not sufficient to prevent large language models from generating harmful information: A psychoanalytic perspective
Zi Yin
Wei Ding
Jia Liu
67
1
0
14 Nov 2023
A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can
  Fool Large Language Models Easily
A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily
Peng Ding
Jun Kuang
Dan Ma
Xuezhi Cao
Yunsen Xian
Jiajun Chen
Shujian Huang
AAML
94
122
0
14 Nov 2023
MART: Improving LLM Safety with Multi-round Automatic Red-Teaming
MART: Improving LLM Safety with Multi-round Automatic Red-Teaming
Suyu Ge
Chunting Zhou
Rui Hou
Madian Khabsa
Yi-Chia Wang
Qifan Wang
Jiawei Han
Yuning Mao
AAMLLRM
88
104
0
13 Nov 2023
Prompt have evil twins
Prompt have evil twins
Rimon Melamed
Lucas H. McCabe
T. Wakhare
Yejin Kim
H. H. Huang
Enric Boix-Adsera
53
3
0
13 Nov 2023
In-context Vectors: Making In Context Learning More Effective and
  Controllable Through Latent Space Steering
In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
Sheng Liu
Haotian Ye
Lei Xing
James Y. Zou
127
117
0
11 Nov 2023
Intentional Biases in LLM Responses
Intentional Biases in LLM Responses
Nicklaus Badyal
Derek Jacoby
Yvonne Coady
55
5
0
11 Nov 2023
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
Yichen Gong
Delong Ran
Jinyuan Liu
Conglei Wang
Tianshuo Cong
Anyu Wang
Sisi Duan
Xiaoyun Wang
MLLM
235
160
0
09 Nov 2023
Conversational AI Threads for Visualizing Multidimensional Datasets
Conversational AI Threads for Visualizing Multidimensional Datasets
Matt-Heun Hong
Anamaria Crisan
77
9
0
09 Nov 2023
Previous
123...1920212223
Next