Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.13860
Cited By
Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study
23 May 2023
Yi Liu
Gelei Deng
Zhengzi Xu
Yuekang Li
Yaowen Zheng
Ying Zhang
Lida Zhao
Tianwei Zhang
Kailong Wang
Yang Liu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study"
50 / 66 papers shown
Title
CARES: Comprehensive Evaluation of Safety and Adversarial Robustness in Medical LLMs
Sijia Chen
Xiaomin Li
Mengxue Zhang
Eric Hanchen Jiang
Qingcheng Zeng
Chen-Hsiang Yu
AAML
MU
ELM
31
0
0
16 May 2025
LM-Scout: Analyzing the Security of Language Model Integration in Android Apps
Muhammad Ibrahim
Gűliz Seray Tuncay
Z. Berkay Celik
Aravind Machiry
Antonio Bianchi
38
0
0
13 May 2025
One Trigger Token Is Enough: A Defense Strategy for Balancing Safety and Usability in Large Language Models
Haoran Gu
Handing Wang
Yi Mei
Mengjie Zhang
Yaochu Jin
27
0
0
12 May 2025
Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs
Chetan Pathade
AAML
SILM
59
1
0
07 May 2025
Attack and defense techniques in large language models: A survey and new perspectives
Zhiyu Liao
Kang Chen
Yuanguo Lin
Kangkang Li
Yunxuan Liu
Hefeng Chen
Xingwang Huang
Yuanhui Yu
AAML
59
0
0
02 May 2025
RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models
Bang An
Shiyue Zhang
Mark Dredze
66
3
0
25 Apr 2025
Feature-Aware Malicious Output Detection and Mitigation
Weilong Dong
Peiguang Li
Yu Tian
Xinyi Zeng
Fengdi Li
Sirui Wang
AAML
24
0
0
12 Apr 2025
StyleRec: A Benchmark Dataset for Prompt Recovery in Writing Style Transformation
Shenyang Liu
Yang Gao
Shaoyan Zhai
Liqiang Wang
40
0
0
06 Apr 2025
sudo rm -rf agentic_security
Sejin Lee
Jian Kim
Haon Park
Ashkan Yousefpour
Sangyoon Yu
Min Song
AAML
262
0
0
26 Mar 2025
Single-pass Detection of Jailbreaking Input in Large Language Models
Leyla Naz Candogan
Yongtao Wu
Elias Abad Rocamora
Grigorios G. Chrysos
V. Cevher
AAML
51
0
0
24 Feb 2025
SafeDialBench: A Fine-Grained Safety Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak Attacks
Hongye Cao
Yanming Wang
Sijia Jing
Ziyue Peng
Zhixin Bai
...
Yang Gao
Fanyu Meng
Xi Yang
Chao Deng
Junlan Feng
AAML
48
0
0
16 Feb 2025
Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks
Ang Li
Yin Zhou
Vethavikashini Chithrra Raghuram
Tom Goldstein
Micah Goldblum
AAML
86
8
0
12 Feb 2025
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models
Jingwei Yi
Yueqi Xie
Bin Zhu
Emre Kiciman
Guangzhong Sun
Xing Xie
Fangzhao Wu
AAML
65
65
0
28 Jan 2025
When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search
Xuan Chen
Yuzhou Nie
Wenbo Guo
Xiangyu Zhang
115
10
0
28 Jan 2025
Playing Devil's Advocate: Unmasking Toxicity and Vulnerabilities in Large Vision-Language Models
Abdulkadir Erol
Trilok Padhi
Agnik Saha
Ugur Kursuncu
Mehmet Emin Aktas
53
1
0
17 Jan 2025
ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models
Han Zhang
Hongfu Gao
Qiang Hu
Guanhua Chen
L. Yang
Bingyi Jing
Hongxin Wei
Bing Wang
Haifeng Bai
Lei Yang
AILaw
ELM
52
2
0
24 Oct 2024
Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks
Samuele Poppi
Zheng-Xin Yong
Yifei He
Bobbie Chern
Han Zhao
Aobo Yang
Jianfeng Chi
AAML
56
17
0
23 Oct 2024
SPIN: Self-Supervised Prompt INjection
Leon Zhou
Junfeng Yang
Chengzhi Mao
AAML
SILM
33
0
0
17 Oct 2024
RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process
Peiran Wang
Xiaogeng Liu
Chaowei Xiao
AAML
37
3
0
11 Oct 2024
Do Unlearning Methods Remove Information from Language Model Weights?
Aghyad Deeb
Fabien Roger
AAML
MU
50
14
0
11 Oct 2024
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Shanshan Han
87
1
0
09 Oct 2024
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min Lin
49
8
0
09 Oct 2024
PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs
Jiahao Yu
Yangguang Shao
Hanwen Miao
Junzheng Shi
SILM
AAML
77
4
0
23 Sep 2024
Prompt Obfuscation for Large Language Models
David Pape
Thorsten Eisenhofer
Thorsten Eisenhofer
Lea Schönherr
AAML
41
2
0
17 Sep 2024
Hacking, The Lazy Way: LLM Augmented Pentesting
Dhruva Goyal
Sitaraman Subramanian
Aditya Peela
Nisha P. Shetty
41
7
0
14 Sep 2024
Differentially Private Kernel Density Estimation
Erzhi Liu
Jerry Yao-Chieh Hu
Alex Reneau
Zhao Song
Han Liu
69
3
0
03 Sep 2024
Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification
Boyang Zhang
Yicong Tan
Yun Shen
Ahmed Salem
Michael Backes
Savvas Zannettou
Yang Zhang
LLMAG
AAML
57
15
0
30 Jul 2024
LLMmap: Fingerprinting For Large Language Models
Dario Pasquini
Evgenios M. Kornaropoulos
G. Ateniese
58
6
0
22 Jul 2024
From Theft to Bomb-Making: The Ripple Effect of Unlearning in Defending Against Jailbreak Attacks
Zhexin Zhang
Junxiao Yang
Yida Lu
Pei Ke
Shiyao Cui
Chujie Zheng
Hongning Wang
Minlie Huang
MU
AAML
67
27
0
03 Jul 2024
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
Tinghao Xie
Xiangyu Qi
Yi Zeng
Yangsibo Huang
Udari Madhushani Sehwag
...
Bo Li
Kai Li
Danqi Chen
Peter Henderson
Prateek Mittal
ALM
ELM
58
55
0
20 Jun 2024
JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
Delong Ran
Jinyuan Liu
Yichen Gong
Jingyi Zheng
Xinlei He
Tianshuo Cong
Anyu Wang
ELM
49
10
0
13 Jun 2024
We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs
Joseph Spracklen
Raveen Wijewickrama
A. H. M. N. Sakib
Anindya Maiti
Murtuza Jadliwala
Murtuza Jadliwala
48
9
0
12 Jun 2024
Merging Improves Self-Critique Against Jailbreak Attacks
Victor Gallego
AAML
MoMe
44
3
0
11 Jun 2024
Machine Against the RAG: Jamming Retrieval-Augmented Generation with Blocker Documents
Avital Shafran
R. Schuster
Vitaly Shmatikov
52
30
0
09 Jun 2024
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
Xunguang Wang
Daoyuan Wu
Zhenlan Ji
Zongjie Li
Pingchuan Ma
Shuai Wang
Yingjiu Li
Yang Liu
Ning Liu
Juergen Rahmel
AAML
79
8
0
08 Jun 2024
Enhancing Jailbreak Attack Against Large Language Models through Silent Tokens
Jiahao Yu
Haozheng Luo
Jerry Yao-Chieh Hu
Wenbo Guo
Han Liu
Xinyu Xing
45
19
0
31 May 2024
Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models
Chia-Yi Hsu
Yu-Lin Tsai
Chih-Hsun Lin
Pin-Yu Chen
Chia-Mu Yu
Chun-ying Huang
52
35
0
27 May 2024
Red-Teaming for Inducing Societal Bias in Large Language Models
Chunyan Luo
Ahmad Ghawanmeh
Bharat Bhimshetty
Kashyap Murali
Murli Jadhav
Xiaodan Zhu
Faiza Khan Khattak
KELM
46
0
0
08 May 2024
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward
Xuan Xie
Jiayang Song
Zhehua Zhou
Yuheng Huang
Da Song
Lei Ma
OffRL
55
6
0
12 Apr 2024
Sandwich attack: Multi-language Mixture Adaptive Attack on LLMs
Bibek Upadhayay
Vahid Behzadan
AAML
26
14
0
09 Apr 2024
Vocabulary Attack to Hijack Large Language Model Applications
Patrick Levi
Christoph P. Neumann
AAML
32
9
0
03 Apr 2024
Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack
M. Russinovich
Ahmed Salem
Ronen Eldan
56
78
0
02 Apr 2024
Optimization-based Prompt Injection Attack to LLM-as-a-Judge
Jiawen Shi
Zenghui Yuan
Yinuo Liu
Yue Huang
Pan Zhou
Lichao Sun
Neil Zhenqiang Gong
AAML
45
43
0
26 Mar 2024
ImgTrojan: Jailbreaking Vision-Language Models with ONE Image
Xijia Tao
Shuai Zhong
Lei Li
Qi Liu
Lingpeng Kong
47
25
0
05 Mar 2024
Foot In The Door: Understanding Large Language Model Jailbreaking via Cognitive Psychology
Zhenhua Wang
Wei Xie
Baosheng Wang
Enze Wang
Zhiwen Gui
Shuoyoucheng Ma
Kai Chen
36
14
0
24 Feb 2024
Can Large Language Models Detect Misinformation in Scientific News Reporting?
Yupeng Cao
Aishwarya Muralidharan Nair
Elyon Eyimife
Nastaran Jamalipour Soofi
K. P. Subbalakshmi
J. Wullert
Chumki Basu
David Shallcross
44
8
0
22 Feb 2024
AbuseGPT: Abuse of Generative AI ChatBots to Create Smishing Campaigns
Ashfak Md Shibli
Mir Mehedi A. Pritom
Maanak Gupta
30
9
0
15 Feb 2024
Comprehensive Assessment of Jailbreak Attacks Against LLMs
Junjie Chu
Yugeng Liu
Ziqing Yang
Xinyue Shen
Michael Backes
Yang Zhang
AAML
43
68
0
08 Feb 2024
Black-Box Access is Insufficient for Rigorous AI Audits
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
38
78
0
25 Jan 2024
All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks
Kazuhiro Takemoto
44
21
0
18 Jan 2024
1
2
Next