Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.05197
Cited By
Multi-step Jailbreaking Privacy Attacks on ChatGPT
11 April 2023
Haoran Li
Dadi Guo
Wei Fan
Mingshi Xu
Jie Huang
Fanpu Meng
Yangqiu Song
SILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multi-step Jailbreaking Privacy Attacks on ChatGPT"
50 / 238 papers shown
Title
Maatphor: Automated Variant Analysis for Prompt Injection Attacks
Ahmed Salem
Andrew J. Paverd
Boris Köpf
32
8
0
12 Dec 2023
METAL: Metamorphic Testing Framework for Analyzing Large-Language Model Qualities
Sangwon Hyun
Mingyu Guo
Muhammad Ali Babar
39
8
0
11 Dec 2023
Make Them Spill the Beans! Coercive Knowledge Extraction from (Production) LLMs
Zhuo Zhang
Guangyu Shen
Guanhong Tao
Shuyang Cheng
Xiangyu Zhang
41
13
0
08 Dec 2023
Analyzing the Inherent Response Tendency of LLMs: Real-World Instructions-Driven Jailbreak
Yanrui Du
Sendong Zhao
Ming Ma
Yuhan Chen
Bing Qin
26
15
0
07 Dec 2023
Dr. Jekyll and Mr. Hyde: Two Faces of LLMs
Matteo Gioele Collu
Tom Janssen-Groesbeek
Stefanos Koffas
Mauro Conti
S. Picek
21
1
0
06 Dec 2023
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Anay Mehrotra
Manolis Zampetakis
Paul Kassianik
Blaine Nelson
Hyrum Anderson
Yaron Singer
Amin Karbasi
35
206
0
04 Dec 2023
A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly
Yifan Yao
Jinhao Duan
Kaidi Xu
Yuanfang Cai
Eric Sun
Yue Zhang
PILM
ELM
44
475
0
04 Dec 2023
Walking a Tightrope -- Evaluating Large Language Models in High-Risk Domains
Chia-Chien Hung
Wiem Ben-Rim
Lindsay Frost
Lars Bruckner
Carolin (Haas) Lawrence
AILaw
ALM
ELM
25
9
0
25 Nov 2023
Exploiting Large Language Models (LLMs) through Deception Techniques and Persuasion Principles
Sonali Singh
Faranak Abri
A. Namin
32
15
0
24 Nov 2023
Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
Zhexin Zhang
Junxiao Yang
Pei Ke
Fei Mi
Hongning Wang
Minlie Huang
AAML
28
115
0
15 Nov 2023
Alignment is not sufficient to prevent large language models from generating harmful information: A psychoanalytic perspective
Zi Yin
Wei Ding
Jia Liu
29
1
0
14 Nov 2023
SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models
Bertie Vidgen
Nino Scherrer
Hannah Rose Kirk
Rebecca Qian
Anand Kannappan
Scott A. Hale
Paul Röttger
ALM
ELM
35
27
0
14 Nov 2023
WaterBench: Towards Holistic Evaluation of Watermarks for Large Language Models
Shangqing Tu
Yuliang Sun
Yushi Bai
Jifan Yu
Lei Hou
Juanzi Li
WaLM
35
9
0
13 Nov 2023
Flames: Benchmarking Value Alignment of LLMs in Chinese
Kexin Huang
Xiangyang Liu
Qianyu Guo
Tianxiang Sun
Jiawei Sun
...
Yixu Wang
Yan Teng
Xipeng Qiu
Yingchun Wang
Dahua Lin
ALM
35
10
0
12 Nov 2023
Fake Alignment: Are LLMs Really Aligned Well?
Yixu Wang
Yan Teng
Kexin Huang
Chengqi Lyu
Songyang Zhang
Wenwei Zhang
Xingjun Ma
Yu-Gang Jiang
Yu Qiao
Yingchun Wang
40
16
0
10 Nov 2023
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
Yichen Gong
Delong Ran
Jinyuan Liu
Conglei Wang
Tianshuo Cong
Anyu Wang
Sisi Duan
Xiaoyun Wang
MLLM
134
120
0
09 Nov 2023
A Survey of Large Language Models in Medicine: Progress, Application, and Challenge
Hongjian Zhou
Fenglin Liu
Boyang Gu
Xinyu Zou
Jinfa Huang
...
Yefeng Zheng
Lei A. Clifton
Zheng Li
Fenglin Liu
David A. Clifton
LM&MA
38
107
0
09 Nov 2023
PrivLM-Bench: A Multi-level Privacy Evaluation Benchmark for Language Models
Haoran Li
Dadi Guo
Donghao Li
Wei Fan
Qi Hu
Xin Liu
Chunkit Chan
Duanyi Yao
Yuan Yao
Yangqiu Song
PILM
39
24
0
07 Nov 2023
DEPN: Detecting and Editing Privacy Neurons in Pretrained Language Models
Xinwei Wu
Junzhuo Li
Minghui Xu
Weilong Dong
Shuangzhi Wu
Chao Bian
Deyi Xiong
MU
KELM
32
46
0
31 Oct 2023
Differentially Private Reward Estimation with Preference Feedback
Sayak Ray Chowdhury
Xingyu Zhou
Nagarajan Natarajan
46
4
0
30 Oct 2023
From Chatbots to PhishBots? -- Preventing Phishing scams created using ChatGPT, Google Bard and Claude
Sayak Saha Roy
Poojitha Thota
Krishna Vamsi Naragam
Shirin Nilizadeh
SILM
51
17
0
29 Oct 2023
SoK: Memorization in General-Purpose Large Language Models
Valentin Hartmann
Anshuman Suri
Vincent Bindschaedler
David E. Evans
Shruti Tople
Robert West
KELM
LLMAG
24
20
0
24 Oct 2023
The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks
Xiaoyi Chen
Siyuan Tang
Rui Zhu
Shijun Yan
Lei Jin
Zihao Wang
Liya Su
Zhikun Zhang
Xiaofeng Wang
Haixu Tang
AAML
PILM
24
17
0
24 Oct 2023
StoryAnalogy: Deriving Story-level Analogies from Large Language Models to Unlock Analogical Understanding
Cheng Jiayang
Lin Qiu
Tszho Chan
Tianqing Fang
Weiqi Wang
...
Qipeng Guo
Hongming Zhang
Yangqiu Song
Yue Zhang
Zheng-Wei Zhang
40
30
0
19 Oct 2023
Attack Prompt Generation for Red Teaming and Defending Large Language Models
Boyi Deng
Wenjie Wang
Fuli Feng
Yang Deng
Qifan Wang
Xiangnan He
AAML
25
49
0
19 Oct 2023
Denevil: Towards Deciphering and Navigating the Ethical Values of Large Language Models via Instruction Learning
Shitong Duan
Xiaoyuan Yi
Peng Zhang
T. Lu
Xing Xie
Ning Gu
24
9
0
17 Oct 2023
Privacy in Large Language Models: Attacks, Defenses and Future Directions
Haoran Li
Yulin Chen
Jinglong Luo
Yan Kang
Xiaojin Zhang
Qi Hu
Chunkit Chan
Yangqiu Song
PILM
50
42
0
16 Oct 2023
Who Said That? Benchmarking Social Media AI Detection
Wanyun Cui
Linqiu Zhang
Qianle Wang
Shuyang Cai
DeLMO
36
9
0
12 Oct 2023
Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation
Yangsibo Huang
Samyak Gupta
Mengzhou Xia
Kai Li
Danqi Chen
AAML
35
273
0
10 Oct 2023
Multilingual Jailbreak Challenges in Large Language Models
Yue Deng
Wenxuan Zhang
Sinno Jialin Pan
Lidong Bing
AAML
36
114
0
10 Oct 2023
AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models
Xiaogeng Liu
Nan Xu
Muhao Chen
Chaowei Xiao
SILM
38
262
0
03 Oct 2023
Large Language Models Cannot Self-Correct Reasoning Yet
Jie Huang
Xinyun Chen
Swaroop Mishra
Huaixiu Steven Zheng
Adams Wei Yu
Xinying Song
Denny Zhou
ReLM
LRM
38
422
0
03 Oct 2023
On the Safety of Open-Sourced Large Language Models: Does Alignment Really Prevent Them From Being Misused?
Hangfan Zhang
Zhimeng Guo
Huaisheng Zhu
Bochuan Cao
Lu Lin
Jinyuan Jia
Jinghui Chen
Di Wu
78
23
0
02 Oct 2023
LatticeGen: A Cooperative Framework which Hides Generated Text in a Lattice for Privacy-Aware Generation on Cloud
Mengke Zhang
Tianxing He
Tianle Wang
Lu Mi
Fatemehsadat Mireshghallah
Binyi Chen
Hao Wang
Yulia Tsvetkov
34
0
0
29 Sep 2023
Privacy-preserving and Privacy-attacking Approaches for Speech and Audio -- A Survey
Yuchen Liu
Apu Kapadia
Donald Williamson
AAML
41
0
0
26 Sep 2023
Large Language Model Alignment: A Survey
Tianhao Shen
Renren Jin
Yufei Huang
Chuang Liu
Weilong Dong
Zishan Guo
Xinwei Wu
Yan Liu
Deyi Xiong
LM&MA
19
177
0
26 Sep 2023
Can LLM-Generated Misinformation Be Detected?
Canyu Chen
Kai Shu
DeLMO
39
158
0
25 Sep 2023
Goal-Oriented Prompt Attack and Safety Evaluation for LLMs
Chengyuan Liu
Fubang Zhao
Lizhi Qing
Yangyang Kang
Changlong Sun
Kun Kuang
Fei Wu
AAML
40
16
0
21 Sep 2023
"It's a Fair Game", or Is It? Examining How Users Navigate Disclosure Risks and Benefits When Using LLM-Based Conversational Agents
Zhiping Zhang
Michelle Jia
Hao-Ping Lee
Bingsheng Yao
Sauvik Das
Ada Lerner
Dakuo Wang
Tianshi Li
SILM
ELM
24
70
0
20 Sep 2023
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
119
303
0
19 Sep 2023
Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM
Bochuan Cao
Yu Cao
Lu Lin
Jinghui Chen
AAML
36
135
0
18 Sep 2023
Self-Consistent Narrative Prompts on Abductive Natural Language Inference
Chunkit Chan
Xin Liu
Tszho Chan
Cheng Jiayang
Yangqiu Song
Ginny Wong
Simon See
LRM
33
6
0
15 Sep 2023
SafetyBench: Evaluating the Safety of Large Language Models
Zhexin Zhang
Leqi Lei
Lindong Wu
Rui Sun
Yongkang Huang
Chong Long
Xiao Liu
Xuanyu Lei
Jie Tang
Minlie Huang
LRM
LM&MA
ELM
45
91
0
13 Sep 2023
FuzzLLM: A Novel and Universal Fuzzing Framework for Proactively Discovering Jailbreak Vulnerabilities in Large Language Models
Dongyu Yao
Jianshu Zhang
Ian G. Harris
Marcel Carlsson
29
30
0
11 Sep 2023
Down the Toxicity Rabbit Hole: A Novel Framework to Bias Audit Large Language Models
Arka Dutta
Adel Khorramrouz
Sujan Dutta
Ashiqur R. KhudaBukhsh
22
0
0
08 Sep 2023
Demystifying RCE Vulnerabilities in LLM-Integrated Apps
Tong Liu
Zizhuang Deng
Guozhu Meng
Yuekang Li
Kai Chen
SILM
44
19
0
06 Sep 2023
Baseline Defenses for Adversarial Attacks Against Aligned Language Models
Neel Jain
Avi Schwarzschild
Yuxin Wen
Gowthami Somepalli
John Kirchenbauer
Ping Yeh-Chiang
Micah Goldblum
Aniruddha Saha
Jonas Geiping
Tom Goldstein
AAML
60
340
0
01 Sep 2023
Quantifying and Analyzing Entity-level Memorization in Large Language Models
Zhenhong Zhou
Jiuyang Xiang
Chao-Yi Chen
Sen Su
PILM
38
8
0
30 Aug 2023
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities
Maximilian Mozes
Xuanli He
Bennett Kleinberg
Lewis D. Griffin
39
78
0
24 Aug 2023
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Min Zhang
Björn W. Schuller
LM&MA
AuLLM
33
38
0
24 Aug 2023
Previous
1
2
3
4
5
Next