Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.11521
Cited By
Self-Deception: Reverse Penetrating the Semantic Firewall of Large Language Models
16 August 2023
Zhenhua Wang
Wei Xie
Kai Chen
Baosheng Wang
Zhiwen Gui
Enze Wang
AAML
SILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Self-Deception: Reverse Penetrating the Semantic Firewall of Large Language Models"
14 / 14 papers shown
Title
Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models
Lyne Tchapmi
Mingyu Derek Ma
Fei Wang
Chaowei Xiao
Muhao Chen
SILM
96
81
0
24 May 2023
A Plot is Worth a Thousand Words: Model Information Stealing Attacks via Scientific Plots
Boyang Zhang
Xinlei He
Yun Shen
Tianhao Wang
Yang Zhang
AAML
63
3
0
23 Feb 2023
Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks
Daniel Kang
Xuechen Li
Ion Stoica
Carlos Guestrin
Matei A. Zaharia
Tatsunori Hashimoto
AAML
84
249
0
11 Feb 2023
BEAGLE: Forensics of Deep Learning Backdoor Attack for Better Defense
Shuyang Cheng
Guanhong Tao
Yingqi Liu
Shengwei An
Xiangzhe Xu
...
Guangyu Shen
Kaiyuan Zhang
Qiuling Xu
Shiqing Ma
Xiangyu Zhang
AAML
53
16
0
16 Jan 2023
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
...
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
SyDa
MoMe
168
1,603
0
15 Dec 2022
Careful What You Wish For: on the Extraction of Adversarially Trained Models
Kacem Khaled
Gabriela Nicolescu
F. Magalhães
MIACV
AAML
43
4
0
21 Jul 2022
Emergent Abilities of Large Language Models
Jason W. Wei
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
...
Tatsunori Hashimoto
Oriol Vinyals
Percy Liang
J. Dean
W. Fedus
ELM
ReLM
LRM
261
2,462
0
15 Jun 2022
XAI for Cybersecurity: State of the Art, Challenges, Open Issues and Future Directions
Gautam Srivastava
Rutvij H. Jhaveri
S. Bhattacharya
Sharnil Pandya
Rajeswari
Praveen Kumar Reddy Maddikunta
Gokul Yenduri
Jon G. Hall
M. Alazab
Thippa Reddy Gadekallu
54
54
0
03 Jun 2022
Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets
Florian Tramèr
Reza Shokri
Ayrton San Joaquin
Hoang Minh Le
Matthew Jagielski
Sanghyun Hong
Nicholas Carlini
MIACV
85
119
0
31 Mar 2022
Fooling the Eyes of Autonomous Vehicles: Robust Physical Adversarial Examples Against Traffic Sign Recognition Systems
Wei Jia
Zhaojun Lu
Haichun Zhang
Zhenglin Liu
Jie Wang
Gang Qu
AAML
41
51
0
17 Jan 2022
Ethical and social risks of harm from Language Models
Laura Weidinger
John F. J. Mellor
Maribeth Rauh
Conor Griffin
J. Uesato
...
Lisa Anne Hendricks
William S. Isaac
Sean Legassick
G. Irving
Iason Gabriel
PILM
95
1,025
0
08 Dec 2021
Poisoning the Unlabeled Dataset of Semi-Supervised Learning
Nicholas Carlini
AAML
182
68
0
04 May 2021
GPT Understands, Too
Xiao Liu
Yanan Zheng
Zhengxiao Du
Ming Ding
Yujie Qian
Zhilin Yang
Jie Tang
VLM
155
1,173
0
18 Mar 2021
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Xiang Lisa Li
Percy Liang
213
4,238
0
01 Jan 2021
1