Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.05733
Cited By
Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks
11 February 2023
Daniel Kang
Xuechen Li
Ion Stoica
Carlos Guestrin
Matei A. Zaharia
Tatsunori Hashimoto
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks"
50 / 177 papers shown
Title
Hide Your Malicious Goal Into Benign Narratives: Jailbreak Large Language Models through Carrier Articles
Zhilong Wang
Haizhou Wang
Nanqing Luo
Lan Zhang
Xiaoyan Sun
Yebo Cao
Peng Liu
25
0
0
20 Aug 2024
Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks
Kexin Chen
Yi Liu
Donghai Hong
Jiaying Chen
Wenhai Wang
44
1
0
18 Aug 2024
Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models
Zi Liang
Haibo Hu
Qingqing Ye
Yaxin Xiao
Haoyang Li
AAML
ELM
SILM
48
6
0
05 Aug 2024
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
Jingtong Su
Mingyu Lee
SangKeun Lee
43
8
0
02 Aug 2024
Can LLMs be Fooled? Investigating Vulnerabilities in LLMs
Sara Abdali
Jia He
C. Barberan
Richard Anarfi
36
7
0
30 Jul 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
57
10
0
20 Jul 2024
BadRobot: Jailbreaking Embodied LLMs in the Physical World
Hangtao Zhang
Chenyu Zhu
Xianlong Wang
Ziqi Zhou
Yichen Wang
...
Shengshan Hu
Leo Yu Zhang
Aishan Liu
Peijin Guo
Leo Yu Zhang
LM&Ro
53
7
0
16 Jul 2024
Jailbreak Attacks and Defenses Against Large Language Models: A Survey
Sibo Yi
Yule Liu
Zhen Sun
Tianshuo Cong
Xinlei He
Jiaxing Song
Ke Xu
Qi Li
AAML
39
80
0
05 Jul 2024
JailbreakHunter: A Visual Analytics Approach for Jailbreak Prompts Discovery from Large-Scale Human-LLM Conversational Datasets
Zhihua Jin
Shiyi Liu
Haotian Li
Xun Zhao
Huamin Qu
50
3
0
03 Jul 2024
LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models
Hayder Elesedy
Pedro M. Esperança
Silviu Vlad Oprea
Mete Ozay
KELM
36
2
0
03 Jul 2024
From Theft to Bomb-Making: The Ripple Effect of Unlearning in Defending Against Jailbreak Attacks
Zhexin Zhang
Junxiao Yang
Pei Ke
Shiyao Cui
Shiyao Cui
Chujie Zheng
Hongning Wang
Minlie Huang
MU
AAML
64
26
0
03 Jul 2024
Poisoned LangChain: Jailbreak LLMs by LangChain
Ziqiu Wang
Jun Liu
Shengkai Zhang
Yang Yang
31
7
0
26 Jun 2024
JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models
Haibo Jin
Leyang Hu
Xinuo Li
Peiyan Zhang
Chonghan Chen
Jun Zhuang
Haohan Wang
PILM
36
26
0
26 Jun 2024
From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking
Siyuan Wang
Zhuohan Long
Zhihao Fan
Zhongyu Wei
42
6
0
21 Jun 2024
AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents
Edoardo Debenedetti
Jie Zhang
Mislav Balunović
Luca Beurer-Kellner
Marc Fischer
Florian Tramèr
LLMAG
AAML
56
26
1
19 Jun 2024
Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis
Yuping Lin
Pengfei He
Han Xu
Yue Xing
Makoto Yamada
Hui Liu
Jiliang Tang
34
10
0
16 Jun 2024
Understanding Jailbreak Success: A Study of Latent Space Dynamics in Large Language Models
Sarah Ball
Frauke Kreuter
Nina Rimsky
40
13
0
13 Jun 2024
Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey
Shang Wang
Tianqing Zhu
Bo Liu
Ming Ding
Xu Guo
Dayong Ye
Wanlei Zhou
Philip S. Yu
PILM
67
17
0
12 Jun 2024
We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs
Joseph Spracklen
Raveen Wijewickrama
A. H. M. N. Sakib
Anindya Maiti
Murtuza Jadliwala
Murtuza Jadliwala
45
10
0
12 Jun 2024
Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models
Kalyan Nakka
Jimmy Dani
Nitesh Saxena
45
1
0
08 Jun 2024
Measure-Observe-Remeasure: An Interactive Paradigm for Differentially-Private Exploratory Analysis
Priyanka Nanayakkara
Hyeok Kim
Yifan Wu
Ali Sarvghad
Narges Mahyar
G. Miklau
Jessica Hullman
31
17
0
04 Jun 2024
Safeguarding Large Language Models: A Survey
Yi Dong
Ronghui Mu
Yanghao Zhang
Siqi Sun
Tianle Zhang
...
Yi Qi
Jinwei Hu
Jie Meng
Saddek Bensalem
Xiaowei Huang
OffRL
KELM
AILaw
35
19
0
03 Jun 2024
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
Richard Fang
Antony Kellermann
Akul Gupta
Qiusi Zhan
Richard Fang
R. Bindu
Daniel Kang
LLMAG
40
30
0
02 Jun 2024
Improved Techniques for Optimization-Based Jailbreaking on Large Language Models
Xiaojun Jia
Tianyu Pang
Chao Du
Yihao Huang
Jindong Gu
Yang Liu
Xiaochun Cao
Min-Bin Lin
AAML
52
22
0
31 May 2024
Voice Jailbreak Attacks Against GPT-4o
Xinyue Shen
Yixin Wu
Michael Backes
Yang Zhang
AuLLM
40
9
0
29 May 2024
MoGU: A Framework for Enhancing Safety of Open-Sourced LLMs While Preserving Their Usability
Yanrui Du
Sendong Zhao
Danyang Zhao
Ming Ma
Yuhan Chen
Liangyu Huo
Qing Yang
Dongliang Xu
Bing Qin
33
6
0
23 May 2024
Tiny Refinements Elicit Resilience: Toward Efficient Prefix-Model Against LLM Red-Teaming
Jiaxu Liu
Xiangyu Yin
Sihao Wu
Jianhong Wang
Meng Fang
Xinping Yi
Xiaowei Huang
34
4
0
21 May 2024
Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs
Valeriia Cherepanova
James Zou
AAML
33
4
0
26 Apr 2024
Fake Artificial Intelligence Generated Contents (FAIGC): A Survey of Theories, Detection Methods, and Opportunities
Xiaomin Yu
Yezhaohui Wang
Yanfang Chen
Zhen Tao
Dinghao Xi
Shichao Song
Simin Niu
Zhiyu Li
67
8
0
25 Apr 2024
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
Anselm Paulus
Arman Zharmagambetov
Chuan Guo
Brandon Amos
Yuandong Tian
AAML
55
56
0
21 Apr 2024
Uncovering Safety Risks of Large Language Models through Concept Activation Vector
Zhihao Xu
Ruixuan Huang
Changyu Chen
Shuai Wang
Xiting Wang
LLMSV
34
10
0
18 Apr 2024
LLMs for Cyber Security: New Opportunities
D. Divakaran
Sai Teja Peddinti
24
11
0
17 Apr 2024
JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models
Yingchaojie Feng
Zhizhang Chen
Zhining Kang
Sijia Wang
Minfeng Zhu
Wei Zhang
Wei Chen
42
3
0
12 Apr 2024
LLM Agents can Autonomously Exploit One-day Vulnerabilities
Richard Fang
R. Bindu
Akul Gupta
Daniel Kang
SILM
LLMAG
78
55
0
11 Apr 2024
GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications
Shishir G. Patil
Tianjun Zhang
Vivian Fang
Noppapon C Roy Huang
Uc Berkeley
Aaron Hao
Martin Casado
Joseph E. Gonzalez Raluca
Ada Popa
Ion Stoica
ALM
34
10
0
10 Apr 2024
Learn to Disguise: Avoid Refusal Responses in LLM's Defense via a Multi-agent Attacker-Disguiser Game
Qianqiao Xu
Zhiliang Tian
Hongyan Wu
Zhen Huang
Yiping Song
Feng Liu
Dongsheng Li
LLMAG
AAML
36
2
0
03 Apr 2024
Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models
Zhiyuan Yu
Xiaogeng Liu
Shunning Liang
Zach Cameron
Chaowei Xiao
Ning Zhang
30
40
0
26 Mar 2024
Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices
Sara Abdali
Richard Anarfi
C. Barberan
Jia He
PILM
70
24
0
19 Mar 2024
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion
Qibing Ren
Chang Gao
Jing Shao
Junchi Yan
Xin Tan
Wai Lam
Lizhuang Ma
ALM
ELM
AAML
42
21
0
12 Mar 2024
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents
Qiusi Zhan
Zhixiang Liang
Zifan Ying
Daniel Kang
LLMAG
46
73
0
05 Mar 2024
Exploring Advanced Methodologies in Security Evaluation for LLMs
Junming Huang
Jiawei Zhang
Qi Wang
Weihong Han
Yanchun Zhang
45
0
0
28 Feb 2024
Coercing LLMs to do and reveal (almost) anything
Jonas Geiping
Alex Stein
Manli Shu
Khalid Saifullah
Yuxin Wen
Tom Goldstein
AAML
48
43
0
21 Feb 2024
Semantic Mirror Jailbreak: Genetic Algorithm Based Jailbreak Prompts Against Open-source LLMs
Xiaoxia Li
Siyuan Liang
Jiyi Zhang
Hansheng Fang
Aishan Liu
Ee-Chien Chang
90
24
0
21 Feb 2024
A Comprehensive Study of Jailbreak Attack versus Defense for Large Language Models
Zihao Xu
Yi Liu
Gelei Deng
Yuekang Li
S. Picek
PILM
AAML
33
35
0
21 Feb 2024
Is the System Message Really Important to Jailbreaks in Large Language Models?
Xiaotian Zou
Yongkang Chen
Ke Li
22
13
0
20 Feb 2024
Prompt Stealing Attacks Against Large Language Models
Zeyang Sha
Yang Zhang
SILM
AAML
43
28
0
20 Feb 2024
ROSE Doesn't Do That: Boosting the Safety of Instruction-Tuned Large Language Models with Reverse Prompt Contrastive Decoding
Qihuang Zhong
Liang Ding
Juhua Liu
Bo Du
Dacheng Tao
LM&MA
44
22
0
19 Feb 2024
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs
Fengqing Jiang
Zhangchen Xu
Luyao Niu
Zhen Xiang
Bhaskar Ramasubramanian
Bo Li
Radha Poovendran
47
86
0
19 Feb 2024
A StrongREJECT for Empty Jailbreaks
Alexandra Souly
Qingyuan Lu
Dillon Bowen
Tu Trinh
Elvis Hsieh
...
Pieter Abbeel
Justin Svegliato
Scott Emmons
Olivia Watkins
Sam Toyer
31
67
0
15 Feb 2024
PAL: Proxy-Guided Black-Box Attack on Large Language Models
Chawin Sitawarin
Norman Mu
David A. Wagner
Alexandre Araujo
ELM
26
29
0
15 Feb 2024
Previous
1
2
3
4
Next