Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.14710
Cited By
Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models
24 May 2023
Lyne Tchapmi
Mingyu Derek Ma
Fei Wang
Chaowei Xiao
Muhao Chen
SILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models"
12 / 62 papers shown
Title
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALM
OffRL
52
473
0
27 Jul 2023
On the Exploitability of Instruction Tuning
Manli Shu
Jiong Wang
Chen Zhu
Jonas Geiping
Chaowei Xiao
Tom Goldstein
SILM
36
92
0
28 Jun 2023
An Overview of Catastrophic AI Risks
Dan Hendrycks
Mantas Mazeika
Thomas Woodside
SILM
26
167
0
21 Jun 2023
Poisoning Language Models During Instruction Tuning
Alexander Wan
Eric Wallace
Sheng Shen
Dan Klein
SILM
104
124
0
01 May 2023
ChatGPT as an Attack Tool: Stealthy Textual Backdoor Attack via Blackbox Generative Model Trigger
Jiazhao Li
Yijin Yang
Zhuofeng Wu
V. Vydiswaran
Chaowei Xiao
SILM
55
42
0
27 Apr 2023
Analyzing And Editing Inner Mechanisms Of Backdoored Language Models
Max Lamparth
Anka Reuel
KELM
36
10
0
24 Feb 2023
Instruction Induction: From Few Examples to Natural Language Task Descriptions
Or Honovich
Uri Shaham
Samuel R. Bowman
Omer Levy
ELM
LRM
120
137
0
22 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
351
12,003
0
04 Mar 2022
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
Stephen H. Bach
Victor Sanh
Zheng-Xin Yong
Albert Webson
Colin Raffel
...
Khalid Almubarak
Xiangru Tang
Dragomir R. Radev
Mike Tian-Jian Jiang
Alexander M. Rush
VLM
225
339
0
02 Feb 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
215
1,661
0
15 Oct 2021
Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer
Fanchao Qi
Yangyi Chen
Xurui Zhang
Mukai Li
Zhiyuan Liu
Maosong Sun
AAML
SILM
82
175
0
14 Oct 2021
Adversarial Example Generation with Syntactically Controlled Paraphrase Networks
Mohit Iyyer
John Wieting
Kevin Gimpel
Luke Zettlemoyer
AAML
GAN
205
712
0
17 Apr 2018
Previous
1
2