Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.14710
Cited By
Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models
24 May 2023
Lyne Tchapmi
Mingyu Derek Ma
Fei Wang
Chaowei Xiao
Muhao Chen
SILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models"
50 / 62 papers shown
Title
Emerging Cyber Attack Risks of Medical AI Agents
Jianing Qiu
Lin Li
Jiankai Sun
Hao Wei
Zhe Xu
K. Lam
Wu Yuan
AAML
33
2
0
02 Apr 2025
Data Poisoning in Deep Learning: A Survey
Pinlong Zhao
Weiyao Zhu
Pengfei Jiao
Di Gao
Ou Wu
AAML
39
0
0
27 Mar 2025
Large Language Models Can Verbatim Reproduce Long Malicious Sequences
Sharon Lin
Krishnamurthy
Dvijotham
Jamie Hayes
Chongyang Shi
Ilia Shumailov
Shuang Song
AAML
44
0
0
21 Mar 2025
Improving Your Model Ranking on Chatbot Arena by Vote Rigging
Rui Min
Tianyu Pang
Chao Du
Qian Liu
Minhao Cheng
Min-Bin Lin
AAML
57
4
0
29 Jan 2025
Neutralizing Backdoors through Information Conflicts for Large Language Models
Chen Chen
Yuchen Sun
Xueluan Gong
Jiaxin Gao
K. Lam
KELM
AAML
77
0
0
27 Nov 2024
PEFTGuard: Detecting Backdoor Attacks Against Parameter-Efficient Fine-Tuning
Zhen Sun
Tianshuo Cong
Yule Liu
Chenhao Lin
Xinlei He
Rongmao Chen
Xingshuo Han
Xinyi Huang
AAML
88
3
0
26 Nov 2024
CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization
Nay Myat Min
Long H. Pham
Yige Li
Jun Sun
AAML
69
4
0
18 Nov 2024
Securing Federated Learning against Backdoor Threats with Foundation Model Integration
Xiaohuan Bi
Xi Li
60
1
0
23 Oct 2024
AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment
Pankayaraj Pathmanathan
Udari Madhushani Sehwag
Michael-Andrei Panaitescu-Liess
Furong Huang
SILM
AAML
43
0
0
15 Oct 2024
PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
Tingchen Fu
Mrinank Sharma
Philip Torr
Shay B. Cohen
David M. Krueger
Fazl Barez
AAML
50
7
0
11 Oct 2024
Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges
Qin Liu
Wenjie Mo
Terry Tong
Lyne Tchapmi
Fei Wang
Chaowei Xiao
Muhao Chen
AAML
39
4
0
30 Sep 2024
PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs
Jiahao Yu
Yangguang Shao
Hanwen Miao
Junzheng Shi
SILM
AAML
74
4
0
23 Sep 2024
Conversational Complexity for Assessing Risk in Large Language Models
John Burden
Manuel Cebrian
José Hernández-Orallo
45
0
0
02 Sep 2024
Protecting against simultaneous data poisoning attacks
Neel Alex
Shoaib Ahmed Siddiqui
Amartya Sanyal
David M. Krueger
AAML
45
1
0
23 Aug 2024
BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models
Yige Li
Hanxun Huang
Yunhan Zhao
Xingjun Ma
Jun Sun
AAML
SILM
56
19
0
23 Aug 2024
Turning Generative Models Degenerate: The Power of Data Poisoning Attacks
Shuli Jiang
S. Kadhe
Yi Zhou
Farhan Ahmed
Ling Cai
Nathalie Baracaldo
SILM
AAML
41
4
0
17 Jul 2024
Noisy Neighbors: Efficient membership inference attacks against LLMs
Filippo Galli
Luca Melis
Tommaso Cucinotta
51
7
0
24 Jun 2024
Adversarial Attacks on Large Language Models in Medicine
Yifan Yang
Qiao Jin
Furong Huang
Zhiyong Lu
AAML
42
4
0
18 Jun 2024
Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models
Xi Li
Yusen Zhang
Renze Lou
Chen Wu
Jiaqi Wang
LRM
AAML
45
12
0
10 Jun 2024
AI Risk Management Should Incorporate Both Safety and Security
Xiangyu Qi
Yangsibo Huang
Yi Zeng
Edoardo Debenedetti
Jonas Geiping
...
Chaowei Xiao
Bo-wen Li
Dawn Song
Peter Henderson
Prateek Mittal
AAML
51
11
0
29 May 2024
TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models
Yuzhou Nie
Yanting Wang
Jinyuan Jia
Michael J. De Lucia
Nathaniel D. Bastian
Wenbo Guo
Dawn Song
SILM
AAML
36
5
0
27 May 2024
Tokenization Matters! Degrading Large Language Models through Challenging Their Tokenization
Dixuan Wang
Yanda Li
Junyuan Jiang
Zepeng Ding
Ziqin Luo
Guochao Jiang
Jiaqing Liang
Deqing Yang
27
11
0
27 May 2024
Meanings and Feelings of Large Language Models: Observability of Latent States in Generative AI
Tian Yu Liu
Stefano Soatto
Matteo Marchi
Pratik Chaudhari
Paulo Tabuada
AI4CE
38
2
0
22 May 2024
Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals
Joshua Clymer
Caden Juang
Severin Field
CVBM
34
2
0
08 May 2024
Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data
Tim Baumgärtner
Yang Gao
Dana Alon
Donald Metzler
AAML
33
18
0
08 Apr 2024
Mapping LLM Security Landscapes: A Comprehensive Stakeholder Risk Assessment Proposal
Rahul Pankajakshan
Sumitra Biswal
Yuvaraj Govindarajulu
Gilad Gressel
30
15
0
20 Mar 2024
On Protecting the Data Privacy of Large Language Models (LLMs): A Survey
Biwei Yan
Kun Li
Minghui Xu
Yueyan Dong
Yue Zhang
Zhaochun Ren
Xiuzhen Cheng
AILaw
PILM
75
76
0
08 Mar 2024
Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment
Jiong Wang
Jiazhao Li
Yiquan Li
Xiangyu Qi
Junjie Hu
Yixuan Li
P. McDaniel
Muhao Chen
Bo Li
Chaowei Xiao
AAML
SILM
40
18
0
22 Feb 2024
VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models
Jiawei Liang
Siyuan Liang
Man Luo
Aishan Liu
Dongchen Han
Ee-Chien Chang
Xiaochun Cao
42
38
0
21 Feb 2024
Defending Against Weight-Poisoning Backdoor Attacks for Parameter-Efficient Fine-Tuning
Shuai Zhao
Leilei Gan
Anh Tuan Luu
Jie Fu
Lingjuan Lyu
Meihuizi Jia
Jinming Wen
AAML
26
23
0
19 Feb 2024
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents
Wenkai Yang
Xiaohan Bi
Yankai Lin
Sishuo Chen
Jie Zhou
Xu Sun
LLMAG
AAML
44
56
0
17 Feb 2024
Machine Unlearning in Large Language Models
Kongyang Chen
Zixin Wang
Bing Mi
Waixi Liu
Shaowei Wang
Xiaojun Ren
Jiaxing Shen
MU
26
11
0
03 Feb 2024
BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
Zhen Xiang
Fengqing Jiang
Zidi Xiong
Bhaskar Ramasubramanian
Radha Poovendran
Bo Li
LRM
SILM
42
40
0
20 Jan 2024
A Comprehensive Survey of Attack Techniques, Implementation, and Mitigation Strategies in Large Language Models
Aysan Esmradi
Daniel Wankit Yip
C. Chan
AAML
38
11
0
18 Dec 2023
Poisoned ChatGPT Finds Work for Idle Hands: Exploring Developers' Coding Practices with Insecure Suggestions from Poisoned AI Models
Sanghak Oh
Kiho Lee
Seonhye Park
Doowon Kim
Hyoungshick Kim
SILM
26
16
0
11 Dec 2023
Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks
Shuli Jiang
S. Kadhe
Yi Zhou
Ling Cai
Nathalie Baracaldo
SILM
AAML
19
13
0
07 Dec 2023
The Philosopher's Stone: Trojaning Plugins of Large Language Models
Tian Dong
Minhui Xue
Guoxing Chen
Rayne Holland
Shaofeng Li
Yan Meng
Zhen Liu
Haojin Zhu
AAML
25
11
0
01 Dec 2023
Unveiling Backdoor Risks Brought by Foundation Models in Heterogeneous Federated Learning
Xi Li
Chen Henry Wu
Jiaqi Wang
AAML
59
5
0
30 Nov 2023
Backdoor Threats from Compromised Foundation Models to Federated Learning
Xi Li
Songhe Wang
Chen Henry Wu
Hao Zhou
Jiaqi Wang
95
10
0
31 Oct 2023
Formalizing and Benchmarking Prompt Injection Attacks and Defenses
Yupei Liu
Yuqi Jia
Runpeng Geng
Jinyuan Jia
Neil Zhenqiang Gong
SILM
LLMAG
27
63
0
19 Oct 2023
Privacy in Large Language Models: Attacks, Defenses and Future Directions
Haoran Li
Yulin Chen
Jinglong Luo
Yan Kang
Xiaojin Zhang
Qi Hu
Chunkit Chan
Yangqiu Song
PILM
48
42
0
16 Oct 2023
Composite Backdoor Attacks Against Large Language Models
Hai Huang
Zhengyu Zhao
Michael Backes
Yun Shen
Yang Zhang
AAML
35
41
0
11 Oct 2023
Building Privacy-Preserving and Secure Geospatial Artificial Intelligence Foundation Models
Jinmeng Rao
Song Gao
Gengchen Mai
Joanna M. Wardlaw
32
20
0
29 Sep 2023
Backdoor Attacks and Countermeasures in Natural Language Processing Models: A Comprehensive Security Review
Pengzhou Cheng
Zongru Wu
Wei Du
Haodong Zhao
Wei Lu
Gongshen Liu
SILM
AAML
34
17
0
12 Sep 2023
A Comprehensive Overview of Backdoor Attacks in Large Language Models within Communication Networks
Haomiao Yang
Kunlan Xiang
Mengyu Ge
Hongwei Li
Rongxing Lu
Shui Yu
SILM
30
41
0
28 Aug 2023
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities
Maximilian Mozes
Xuanli He
Bennett Kleinberg
Lewis D. Griffin
39
78
0
24 Aug 2023
Hiding Backdoors within Event Sequence Data via Poisoning Attacks
Elizaveta Kovtun
A. Ermilova
Dmitry Berestnev
Alexey Zaytsev
SILM
AAML
29
1
0
20 Aug 2023
Self-Deception: Reverse Penetrating the Semantic Firewall of Large Language Models
Zhenhua Wang
Wei Xie
Kai Chen
Baosheng Wang
Zhiwen Gui
Enze Wang
AAML
SILM
27
6
0
16 Aug 2023
Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection
Jun Yan
Vikas Yadav
Shiyang Li
Lichang Chen
Zheng Tang
Hai Wang
Vijay Srinivasan
Xiang Ren
Hongxia Jin
SILM
28
82
0
31 Jul 2023
On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and Outlook
Mingyuan Fan
Chengyu Wang
Cen Chen
Yang Liu
Jun Huang
HILM
39
3
0
31 Jul 2023
1
2
Next