Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.13236
Cited By
SPIN: Self-Supervised Prompt INjection
17 October 2024
Leon Zhou
Junfeng Yang
Chengzhi Mao
AAML
SILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SPIN: Self-Supervised Prompt INjection"
11 / 11 papers shown
Title
How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs
Yi Zeng
Hongpeng Lin
Jingwen Zhang
Diyi Yang
Ruoxi Jia
Weiyan Shi
78
298
0
12 Jan 2024
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
282
1,436
0
27 Jul 2023
Large Language Models
Michael R Douglas
LLMAG
LM&MA
127
616
0
11 Jul 2023
Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks
Daniel Kang
Xuechen Li
Ion Stoica
Carlos Guestrin
Matei A. Zaharia
Tatsunori Hashimoto
AAML
84
249
0
11 Feb 2023
Adversarial Attacks are Reversible with Natural Supervision
Chengzhi Mao
Mia Chiquer
Hao Wang
Junfeng Yang
Carl Vondrick
BDL
AAML
57
56
0
26 Mar 2021
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
133
1,191
0
24 Sep 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
526
4,773
0
23 Jan 2020
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
452
1,717
0
18 Sep 2019
Scalable agent alignment via reward modeling: a research direction
Jan Leike
David M. Krueger
Tom Everitt
Miljan Martic
Vishal Maini
Shane Legg
86
413
0
19 Nov 2018
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
Mandar Joshi
Eunsol Choi
Daniel S. Weld
Luke Zettlemoyer
RALM
195
2,636
0
09 May 2017
Intriguing properties of neural networks
Christian Szegedy
Wojciech Zaremba
Ilya Sutskever
Joan Bruna
D. Erhan
Ian Goodfellow
Rob Fergus
AAML
239
14,893
1
21 Dec 2013
1