Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.05733
Cited By
Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks
11 February 2023
Daniel Kang
Xuechen Li
Ion Stoica
Carlos Guestrin
Matei A. Zaharia
Tatsunori Hashimoto
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks"
28 / 178 papers shown
Title
"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
Xinyue Shen
Zhenpeng Chen
Michael Backes
Yun Shen
Yang Zhang
SILM
40
249
0
07 Aug 2023
Mondrian: Prompt Abstraction Attack Against Large Language Models for Cheaper API Pricing
Waiman Si
Michael Backes
Yang Zhang
30
5
0
07 Aug 2023
From Prompt Injections to SQL Injection Attacks: How Protected is Your LLM-Integrated Web Application?
Rodrigo Pedro
Daniel Castro
Paulo Carreira
Nuno Santos
SILM
AAML
41
51
0
03 Aug 2023
Embedding Democratic Values into Social Media AIs via Societal Objective Functions
Chenyan Jia
Michelle S. Lam
Minh Chau Mai
Jeffrey T. Hancock
Michael S. Bernstein
23
27
0
26 Jul 2023
LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?
David Glukhov
Ilia Shumailov
Y. Gal
Nicolas Papernot
Vardan Papyan
AAML
ELM
30
57
0
20 Jul 2023
Latent Jailbreak: A Benchmark for Evaluating Text Safety and Output Robustness of Large Language Models
Huachuan Qiu
Shuai Zhang
Anqi Li
Hongliang He
Zhenzhong Lan
ALM
44
48
0
17 Jul 2023
Jailbroken: How Does LLM Safety Training Fail?
Alexander Wei
Nika Haghtalab
Jacob Steinhardt
107
852
0
05 Jul 2023
TrustGPT: A Benchmark for Trustworthy and Responsible Large Language Models
Yue Huang
Qihui Zhang
Philip S. Y
Lichao Sun
18
46
0
20 Jun 2023
From Bad to Worse: Using Private Data to Propagate Disinformation on Online Platforms with a Greater Efficiency
Protik Bose Pranto
Waqar Hassan Khan
Sahar Abdelnabi
Rebecca Weil
Mario Fritz
Rakibul Hasan
13
2
0
08 Jun 2023
Can large language models democratize access to dual-use biotechnology?
Emily H. Soice
R. Rocha
Kimberlee Cordova
Michael A. Specter
K. Esvelt
14
46
0
06 Jun 2023
Spear or Shield: Leveraging Generative AI to Tackle Security Threats of Intelligent Network Services
Hongyang Du
Dusit Niyato
Jiawen Kang
Zehui Xiong
K. Lam
Ya-Nan Fang
Yonghui Li
AAML
29
13
0
04 Jun 2023
Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks
Abhinav Rao
S. Vashistha
Atharva Naik
Somak Aditya
Monojit Choudhury
35
17
0
24 May 2023
In-Context Impersonation Reveals Large Language Models' Strengths and Biases
Leonard Salewski
Stephan Alaniz
Isabel Rio-Torto
Eric Schulz
Zeynep Akata
44
151
0
24 May 2023
Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language Models
Oana Ignat
Zhijing Jin
Artem Abzaliev
Laura Biester
Santiago Castro
...
Verónica Pérez-Rosas
Siqi Shen
Zekun Wang
Winston Wu
Rada Mihalcea
LRM
41
6
0
21 May 2023
A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation
Xiaowei Huang
Wenjie Ruan
Wei Huang
Gao Jin
Yizhen Dong
...
Sihao Wu
Peipei Xu
Dengyu Wu
André Freitas
Mustafa A. Mustafa
ALM
45
82
0
19 May 2023
Beyond the Safeguards: Exploring the Security Risks of ChatGPT
Erik Derner
Kristina Batistic
SILM
27
65
0
13 May 2023
In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT
Xinyue Shen
Zhenpeng Chen
Michael Backes
Yang Zhang
27
55
0
18 Apr 2023
Multi-step Jailbreaking Privacy Attacks on ChatGPT
Haoran Li
Dadi Guo
Wei Fan
Mingshi Xu
Jie Huang
Fanpu Meng
Yangqiu Song
SILM
47
321
0
11 Apr 2023
Large Language Model Instruction Following: A Survey of Progresses and Challenges
Renze Lou
Kai Zhang
Wenpeng Yin
ALM
LRM
32
20
0
18 Mar 2023
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Kai Greshake
Sahar Abdelnabi
Shailesh Mishra
C. Endres
Thorsten Holz
Mario Fritz
SILM
49
436
0
23 Feb 2023
On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning
Omar Shaikh
Hongxin Zhang
William B. Held
Michael S. Bernstein
Diyi Yang
ReLM
LRM
35
184
0
15 Dec 2022
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
227
506
0
28 Sep 2022
TempLM: Distilling Language Models into Template-Based Generators
Tianyi Zhang
Mina Lee
Lisa Li
Ende Shen
Tatsunori B. Hashimoto
VLM
40
5
0
23 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
339
12,003
0
04 Mar 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
213
1,661
0
15 Oct 2021
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
208
627
0
20 May 2021
LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning
Yuhuai Wu
M. Rabe
Wenda Li
Jimmy Ba
Roger C. Grosse
Christian Szegedy
AIMat
LRM
75
51
0
15 Jan 2021
Robust Encodings: A Framework for Combating Adversarial Typos
Erik Jones
Robin Jia
Aditi Raghunathan
Percy Liang
AAML
142
102
0
04 May 2020
Previous
1
2
3
4