ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.05197
  4. Cited By
Multi-step Jailbreaking Privacy Attacks on ChatGPT

Multi-step Jailbreaking Privacy Attacks on ChatGPT

11 April 2023
Haoran Li
Dadi Guo
Wei Fan
Mingshi Xu
Jie Huang
Fanpu Meng
Yangqiu Song
    SILM
ArXivPDFHTML

Papers citing "Multi-step Jailbreaking Privacy Attacks on ChatGPT"

50 / 237 papers shown
Title
From Persona to Personalization: A Survey on Role-Playing Language
  Agents
From Persona to Personalization: A Survey on Role-Playing Language Agents
Jiangjie Chen
Xintao Wang
Rui Xu
Siyu Yuan
Yikai Zhang
...
Caiyu Hu
Siye Wu
Scott Ren
Ziquan Fu
Yanghua Xiao
62
79
0
28 Apr 2024
Online Personalizing White-box LLMs Generation with Neural Bandits
Online Personalizing White-box LLMs Generation with Neural Bandits
Zekai Chen
Weeden Daniel
Po-yu Chen
Francois Buet-Golfouse
38
3
0
24 Apr 2024
JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large
  Language Models
JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models
Yingchaojie Feng
Zhizhang Chen
Zhining Kang
Sijia Wang
Minfeng Zhu
Wei Zhang
Wei Chen
45
3
0
12 Apr 2024
Two Heads are Better than One: Nested PoE for Robust Defense Against
  Multi-Backdoors
Two Heads are Better than One: Nested PoE for Robust Defense Against Multi-Backdoors
Victoria Graf
Qin Liu
Muhao Chen
AAML
40
8
0
02 Apr 2024
Exploring the Privacy Protection Capabilities of Chinese Large Language
  Models
Exploring the Privacy Protection Capabilities of Chinese Large Language Models
Yuqi Yang
Xiaowen Huang
Jitao Sang
ELM
PILM
AILaw
49
1
0
27 Mar 2024
Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of
  Large Language Models
Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models
Zhiyuan Yu
Xiaogeng Liu
Shunning Liang
Zach Cameron
Chaowei Xiao
Ning Zhang
32
42
0
26 Mar 2024
Optimization-based Prompt Injection Attack to LLM-as-a-Judge
Optimization-based Prompt Injection Attack to LLM-as-a-Judge
Jiawen Shi
Zenghui Yuan
Yinuo Liu
Yue Huang
Pan Zhou
Lichao Sun
Neil Zhenqiang Gong
AAML
45
41
0
26 Mar 2024
Risk and Response in Large Language Models: Evaluating Key Threat
  Categories
Risk and Response in Large Language Models: Evaluating Key Threat Categories
Bahareh Harandizadeh
A. Salinas
Fred Morstatter
25
3
0
22 Mar 2024
Mapping LLM Security Landscapes: A Comprehensive Stakeholder Risk
  Assessment Proposal
Mapping LLM Security Landscapes: A Comprehensive Stakeholder Risk Assessment Proposal
Rahul Pankajakshan
Sumitra Biswal
Yuvaraj Govindarajulu
Gilad Gressel
30
15
0
20 Mar 2024
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for
  Large Language Models
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models
Zehui Chen
Kuikun Liu
Qiuchen Wang
Wenwei Zhang
Jiangning Liu
Dahua Lin
Kai-xiang Chen
Feng Zhao
LLMAG
ALM
AIFin
73
27
0
19 Mar 2024
Securing Large Language Models: Threats, Vulnerabilities and Responsible
  Practices
Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices
Sara Abdali
Richard Anarfi
C. Barberan
Jia He
PILM
73
24
0
19 Mar 2024
EasyJailbreak: A Unified Framework for Jailbreaking Large Language
  Models
EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
Weikang Zhou
Xiao Wang
Limao Xiong
Han Xia
Yingshuang Gu
...
Lijun Li
Jing Shao
Tao Gui
Qi Zhang
Xuanjing Huang
77
32
0
18 Mar 2024
AraTrust: An Evaluation of Trustworthiness for LLMs in Arabic
AraTrust: An Evaluation of Trustworthiness for LLMs in Arabic
Emad A. Alghamdi
Reem I. Masoud
Deema Alnuhait
Afnan Y. Alomairi
Ahmed Ashraf
Mohamed Zaytoon
48
4
0
14 Mar 2024
Review of Generative AI Methods in Cybersecurity
Review of Generative AI Methods in Cybersecurity
Yagmur Yigit
William J. Buchanan
Madjid G Tehrani
Leandros A. Maglaras
AAML
51
19
0
13 Mar 2024
A Safe Harbor for AI Evaluation and Red Teaming
A Safe Harbor for AI Evaluation and Red Teaming
Shayne Longpre
Sayash Kapoor
Kevin Klyman
Ashwin Ramaswami
Rishi Bommasani
...
Daniel Kang
Sandy Pentland
Arvind Narayanan
Percy Liang
Peter Henderson
55
38
0
07 Mar 2024
AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks
AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks
Yifan Zeng
Yiran Wu
Xiao Zhang
Huazheng Wang
Qingyun Wu
LLMAG
AAML
42
59
0
02 Mar 2024
AutoAttacker: A Large Language Model Guided System to Implement
  Automatic Cyber-attacks
AutoAttacker: A Large Language Model Guided System to Implement Automatic Cyber-attacks
Jiacen Xu
Jack W. Stokes
Geoff McDonald
Xuesong Bai
David Marshall
Siyue Wang
Adith Swaminathan
Zhou Li
48
50
0
02 Mar 2024
ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable
  Safety Detectors
ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors
Zhexin Zhang
Yida Lu
Jingyuan Ma
Di Zhang
Rui Li
...
Hao Sun
Lei Sha
Zhifang Sui
Hongning Wang
Minlie Huang
23
26
0
26 Feb 2024
Farsight: Fostering Responsible AI Awareness During AI Application
  Prototyping
Farsight: Fostering Responsible AI Awareness During AI Application Prototyping
Zijie J. Wang
Chinmay Kulkarni
Lauren Wilcox
Michael Terry
Michael A. Madaio
40
43
0
23 Feb 2024
Semantic Mirror Jailbreak: Genetic Algorithm Based Jailbreak Prompts
  Against Open-source LLMs
Semantic Mirror Jailbreak: Genetic Algorithm Based Jailbreak Prompts Against Open-source LLMs
Xiaoxia Li
Siyuan Liang
Jiyi Zhang
Hansheng Fang
Aishan Liu
Ee-Chien Chang
90
24
0
21 Feb 2024
Is the System Message Really Important to Jailbreaks in Large Language
  Models?
Is the System Message Really Important to Jailbreaks in Large Language Models?
Xiaotian Zou
Yongkang Chen
Ke Li
30
13
0
20 Feb 2024
Prompt Stealing Attacks Against Large Language Models
Prompt Stealing Attacks Against Large Language Models
Zeyang Sha
Yang Zhang
SILM
AAML
43
28
0
20 Feb 2024
Defending Jailbreak Prompts via In-Context Adversarial Game
Defending Jailbreak Prompts via In-Context Adversarial Game
Yujun Zhou
Yufei Han
Haomin Zhuang
Kehan Guo
Zhenwen Liang
Hongyan Bao
Xiangliang Zhang
LLMAG
AAML
42
11
0
20 Feb 2024
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
Zhichen Dong
Zhanhui Zhou
Chao Yang
Jing Shao
Yu Qiao
ELM
52
58
0
14 Feb 2024
COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
Xing-ming Guo
Fangxu Yu
Huan Zhang
Lianhui Qin
Bin Hu
AAML
117
70
0
13 Feb 2024
Pandora: Jailbreak GPTs by Retrieval Augmented Generation Poisoning
Pandora: Jailbreak GPTs by Retrieval Augmented Generation Poisoning
Gelei Deng
Yi Liu
Kailong Wang
Yuekang Li
Tianwei Zhang
Yang Liu
26
43
0
13 Feb 2024
Data Reconstruction Attacks and Defenses: A Systematic Evaluation
Data Reconstruction Attacks and Defenses: A Systematic Evaluation
Sheng Liu
Zihan Wang
Yuxiao Chen
Qi Lei
AAML
MIACV
61
4
0
13 Feb 2024
PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented
  Generation of Large Language Models
PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models
Wei Zou
Runpeng Geng
Binghui Wang
Jinyuan Jia
SILM
39
45
1
12 Feb 2024
StruQ: Defending Against Prompt Injection with Structured Queries
StruQ: Defending Against Prompt Injection with Structured Queries
Sizhe Chen
Julien Piet
Chawin Sitawarin
David Wagner
SILM
AAML
30
67
0
09 Feb 2024
Fight Back Against Jailbreaking via Prompt Adversarial Tuning
Fight Back Against Jailbreaking via Prompt Adversarial Tuning
Yichuan Mo
Yuji Wang
Zeming Wei
Yisen Wang
AAML
SILM
49
25
0
09 Feb 2024
Comprehensive Assessment of Jailbreak Attacks Against LLMs
Comprehensive Assessment of Jailbreak Attacks Against LLMs
Junjie Chu
Yugeng Liu
Ziqing Yang
Xinyue Shen
Michael Backes
Yang Zhang
AAML
37
67
0
08 Feb 2024
GUARD: Role-playing to Generate Natural-language Jailbreakings to Test
  Guideline Adherence of Large Language Models
GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models
Haibo Jin
Ruoxi Chen
Andy Zhou
Yang Zhang
Haohan Wang
LLMAG
24
21
0
05 Feb 2024
Aligner: Efficient Alignment by Learning to Correct
Aligner: Efficient Alignment by Learning to Correct
Jiaming Ji
Boyuan Chen
Hantao Lou
Chongye Guo
Borong Zhang
Xuehai Pan
Juntao Dai
Tianyi Qiu
Yaodong Yang
29
28
0
04 Feb 2024
Building Guardrails for Large Language Models
Building Guardrails for Large Language Models
Yizhen Dong
Ronghui Mu
Gao Jin
Yi Qi
Jinwei Hu
Xingyu Zhao
Jie Meng
Wenjie Ruan
Xiaowei Huang
OffRL
63
27
0
02 Feb 2024
An Early Categorization of Prompt Injection Attacks on Large Language
  Models
An Early Categorization of Prompt Injection Attacks on Large Language Models
Sippo Rossi
Alisia Marianne Michel
R. Mukkamala
J. Thatcher
SILM
AAML
26
16
0
31 Jan 2024
Security and Privacy Challenges of Large Language Models: A Survey
Security and Privacy Challenges of Large Language Models: A Survey
B. Das
M. H. Amini
Yanzhao Wu
PILM
ELM
19
107
0
30 Jan 2024
Red-Teaming for Generative AI: Silver Bullet or Security Theater?
Red-Teaming for Generative AI: Silver Bullet or Security Theater?
Michael Feffer
Anusha Sinha
Wesley Hanwen Deng
Zachary Chase Lipton
Hoda Heidari
AAML
42
67
0
29 Jan 2024
Black-Box Access is Insufficient for Rigorous AI Audits
Black-Box Access is Insufficient for Rigorous AI Audits
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
34
78
0
25 Jan 2024
MULTIVERSE: Exposing Large Language Model Alignment Problems in Diverse
  Worlds
MULTIVERSE: Exposing Large Language Model Alignment Problems in Diverse Worlds
Xiaolong Jin
Zhuo Zhang
Xiangyu Zhang
18
3
0
25 Jan 2024
The Language Barrier: Dissecting Safety Challenges of LLMs in
  Multilingual Contexts
The Language Barrier: Dissecting Safety Challenges of LLMs in Multilingual Contexts
Lingfeng Shen
Weiting Tan
Sihao Chen
Yunmo Chen
Jingyu Zhang
Haoran Xu
Boyuan Zheng
Philipp Koehn
Daniel Khashabi
34
38
0
23 Jan 2024
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
Tongxin Yuan
Zhiwei He
Lingzhong Dong
Yiming Wang
Ruijie Zhao
...
Binglin Zhou
Fangqi Li
Zhuosheng Zhang
Rui Wang
Gongshen Liu
ELM
34
62
0
18 Jan 2024
AttackEval: How to Evaluate the Effectiveness of Jailbreak Attacking on
  Large Language Models
AttackEval: How to Evaluate the Effectiveness of Jailbreak Attacking on Large Language Models
Dong Shu
Mingyu Jin
Suiyuan Zhu
Beichen Wang
Zihao Zhou
Chong Zhang
Yongfeng Zhang
ELM
47
12
0
17 Jan 2024
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language
  Model Systems
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
Tianyu Cui
Yanling Wang
Chuanpu Fu
Yong Xiao
Sijia Li
...
Junwu Xiong
Xinyu Kong
Zujie Wen
Ke Xu
Qi Li
60
56
0
11 Jan 2024
A Novel Evaluation Framework for Assessing Resilience Against Prompt
  Injection Attacks in Large Language Models
A Novel Evaluation Framework for Assessing Resilience Against Prompt Injection Attacks in Large Language Models
Daniel Wankit Yip
Aysan Esmradi
C. Chan
AAML
28
11
0
02 Jan 2024
SecFormer: Towards Fast and Accurate Privacy-Preserving Inference for
  Large Language Models
SecFormer: Towards Fast and Accurate Privacy-Preserving Inference for Large Language Models
Jinglong Luo
Yehong Zhang
Zhuo Zhang
Jiaqi Zhang
Xin Mu
Hui Wang
Yue Yu
Zenglin Xu
49
9
0
01 Jan 2024
Jatmo: Prompt Injection Defense by Task-Specific Finetuning
Jatmo: Prompt Injection Defense by Task-Specific Finetuning
Julien Piet
Maha Alrashed
Chawin Sitawarin
Sizhe Chen
Zeming Wei
Elizabeth Sun
Basel Alomair
David Wagner
AAML
SyDa
83
53
0
29 Dec 2023
Differentially Private Low-Rank Adaptation of Large Language Model Using
  Federated Learning
Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning
Xiao-Yang Liu
Rongyi Zhu
Daochen Zha
Jiechao Gao
Shan Zhong
Matt White
Meikang Qiu
26
15
0
29 Dec 2023
A Comprehensive Survey of Attack Techniques, Implementation, and
  Mitigation Strategies in Large Language Models
A Comprehensive Survey of Attack Techniques, Implementation, and Mitigation Strategies in Large Language Models
Aysan Esmradi
Daniel Wankit Yip
C. Chan
AAML
38
11
0
18 Dec 2023
Causality Analysis for Evaluating the Security of Large Language Models
Causality Analysis for Evaluating the Security of Large Language Models
Wei Zhao
Zhe Li
Junfeng Sun
32
10
0
13 Dec 2023
Maatphor: Automated Variant Analysis for Prompt Injection Attacks
Maatphor: Automated Variant Analysis for Prompt Injection Attacks
Ahmed Salem
Andrew J. Paverd
Boris Köpf
32
8
0
12 Dec 2023
Previous
12345
Next