Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.12505
Cited By
Attack Prompt Generation for Red Teaming and Defending Large Language Models
19 October 2023
Boyi Deng
Wenjie Wang
Fuli Feng
Yang Deng
Qifan Wang
Xiangnan He
AAML
Re-assign community
ArXiv (abs)
PDF
HTML
Github (45★)
Papers citing
"Attack Prompt Generation for Red Teaming and Defending Large Language Models"
16 / 16 papers shown
Title
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
Yunhan Zhao
Xiang Zheng
Lin Luo
Yige Li
Xingjun Ma
Yu-Gang Jiang
VLM
AAML
103
6
0
28 Oct 2024
Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
Han Jiang
Xiaoyuan Yi
Zhihua Wei
Ziang Xiao
Shu Wang
Xing Xie
ELM
ALM
136
8
0
20 Jun 2024
Toxicity in ChatGPT: Analyzing Persona-assigned Language Models
Ameet Deshpande
Vishvak Murahari
Tanmay Rajpurohit
Ashwin Kalyan
Karthik Narasimhan
LM&MA
LLMAG
78
371
0
11 Apr 2023
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Kai Greshake
Sahar Abdelnabi
Shailesh Mishra
C. Endres
Thorsten Holz
Mario Fritz
SILM
137
498
0
23 Feb 2023
Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks
Daniel Kang
Xuechen Li
Ion Stoica
Carlos Guestrin
Matei A. Zaharia
Tatsunori Hashimoto
AAML
100
253
0
11 Feb 2023
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
533
6,301
0
05 Apr 2022
Red Teaming Language Models with Language Models
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
183
668
0
07 Feb 2022
Finetuned Language Models Are Zero-Shot Learners
Jason W. Wei
Maarten Bosma
Vincent Zhao
Kelvin Guu
Adams Wei Yu
Brian Lester
Nan Du
Andrew M. Dai
Quoc V. Le
ALM
UQCV
249
3,789
0
03 Sep 2021
Mitigating harm in language models with conditional-likelihood filtration
Helen Ngo
Cooper D. Raterink
J. Araújo
Ivan Zhang
Carol Chen
Adrien Morisot
Nick Frosst
90
42
0
04 Aug 2021
Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets
Irene Solaiman
Christy Dennison
110
226
0
18 Jun 2021
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
502
10,526
0
17 Jun 2021
Recipes for Safety in Open-domain Chatbots
Jing Xu
Da Ju
Margaret Li
Y-Lan Boureau
Jason Weston
Emily Dinan
83
234
0
14 Oct 2020
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
168
1,221
0
24 Sep 2020
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Christopher Clark
Kenton Lee
Ming-Wei Chang
Tom Kwiatkowski
Michael Collins
Kristina Toutanova
244
1,560
0
24 May 2019
Deep reinforcement learning from human preferences
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
218
3,377
0
12 Jun 2017
RACE: Large-scale ReAding Comprehension Dataset From Examinations
Guokun Lai
Qizhe Xie
Hanxiao Liu
Yiming Yang
Eduard H. Hovy
ELM
203
1,359
0
15 Apr 2017
1