ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2105.00164
  4. Cited By
Hidden Backdoors in Human-Centric Language Models
v1v2v3 (latest)

Hidden Backdoors in Human-Centric Language Models

1 May 2021
Shaofeng Li
Hui Liu
Tian Dong
Benjamin Zi Hao Zhao
Minhui Xue
Haojin Zhu
Jialiang Lu
    SILM
ArXiv (abs)PDFHTML

Papers citing "Hidden Backdoors in Human-Centric Language Models"

50 / 79 papers shown
Title
A Systematic Review of Poisoning Attacks Against Large Language Models
A Systematic Review of Poisoning Attacks Against Large Language Models
Neil Fendley
Edward W. Staley
Joshua Carney
William Redman
Marie Chau
Nathan G. Drenkow
AAMLPILM
23
0
0
06 Jun 2025
A Survey of Attacks on Large Language Models
A Survey of Attacks on Large Language Models
Wenrui Xu
Keshab K. Parhi
AAMLELM
84
0
0
18 May 2025
The Ripple Effect: On Unforeseen Complications of Backdoor Attacks
The Ripple Effect: On Unforeseen Complications of Backdoor Attacks
Rui Zhang
Yun Shen
Hongwei Li
Wenbo Jiang
Hanxiao Chen
Yuan Zhang
Guowen Xu
Yang Zhang
SILMAAML
85
0
0
16 May 2025
BadLingual: A Novel Lingual-Backdoor Attack against Large Language Models
BadLingual: A Novel Lingual-Backdoor Attack against Large Language Models
Ziyi Wang
Hongwei Li
Rui Zhang
Wenbo Jiang
Kangjie Chen
Tianwei Zhang
Qingchuan Zhao
Guowen Xu
AAML
101
0
0
06 May 2025
LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures
LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures
Francisco Aguilera-Martínez
Fernando Berzal
PILM
125
0
0
02 May 2025
Double Landmines: Invisible Textual Backdoor Attacks based on Dual-Trigger
Double Landmines: Invisible Textual Backdoor Attacks based on Dual-Trigger
Yang Hou
Qiuling Yue
Lujia Chai
Guozhao Liao
Wenbao Han
Wei Ou
86
0
0
23 Dec 2024
CodePurify: Defend Backdoor Attacks on Neural Code Models via
  Entropy-based Purification
CodePurify: Defend Backdoor Attacks on Neural Code Models via Entropy-based Purification
Fangwen Mu
Junjie Wang
Zhuohao Yu
Lin Shi
Song Wang
Mingyang Li
Qing Wang
AAML
128
2
0
26 Oct 2024
Backdoored Retrievers for Prompt Injection Attacks on Retrieval
  Augmented Generation of Large Language Models
Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models
Cody Clop
Yannick Teglia
AAMLSILMRALM
124
4
0
18 Oct 2024
Data-centric NLP Backdoor Defense from the Lens of Memorization
Data-centric NLP Backdoor Defense from the Lens of Memorization
Zhenting Wang
Zhizhi Wang
Mingyu Jin
Mengnan Du
Juan Zhai
Shiqing Ma
91
3
0
21 Sep 2024
Context is the Key: Backdoor Attacks for In-Context Learning with Vision
  Transformers
Context is the Key: Backdoor Attacks for In-Context Learning with Vision Transformers
Gorka Abad
S. Picek
Lorenzo Cavallaro
A. Urbieta
SILM
77
0
0
06 Sep 2024
Exploiting the Vulnerability of Large Language Models via Defense-Aware
  Architectural Backdoor
Exploiting the Vulnerability of Large Language Models via Defense-Aware Architectural Backdoor
Abdullah Arafat Miah
Yu Bi
AAMLSILM
90
1
0
03 Sep 2024
CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models
CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models
Rui Zeng
Xi Chen
Yuwen Pu
Xuhong Zhang
Tianyu Du
Shouling Ji
111
5
0
02 Sep 2024
The Dark Side of Human Feedback: Poisoning Large Language Models via
  User Inputs
The Dark Side of Human Feedback: Poisoning Large Language Models via User Inputs
Bocheng Chen
Hanqing Guo
Guangjing Wang
Yuanda Wang
Qiben Yan
AAML
104
5
0
01 Sep 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models
  (LLMs)
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
129
15
0
20 Jul 2024
Securing Multi-turn Conversational Language Models Against Distributed
  Backdoor Triggers
Securing Multi-turn Conversational Language Models Against Distributed Backdoor Triggers
Terry Tong
Lyne Tchapmi
Qin Liu
Muhao Chen
AAMLSILM
89
2
0
04 Jul 2024
Future Events as Backdoor Triggers: Investigating Temporal
  Vulnerabilities in LLMs
Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs
Sara Price
Arjun Panickssery
Sam Bowman
Asa Cooper Stickland
LLMSV
46
6
0
04 Jul 2024
BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents
BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents
Yifei Wang
Dizhan Xue
Shengjie Zhang
Shengsheng Qian
AAMLLLMAG
102
38
0
05 Jun 2024
BadActs: A Universal Backdoor Defense in the Activation Space
BadActs: A Universal Backdoor Defense in the Activation Space
Biao Yi
Sishuo Chen
Yiming Li
Tong Li
Baolei Zhang
Zheli Liu
AAML
94
7
0
18 May 2024
ModelShield: Adaptive and Robust Watermark against Model Extraction Attack
ModelShield: Adaptive and Robust Watermark against Model Extraction Attack
Kaiyi Pang
Tao Qi
Chuhan Wu
Minhao Bai
Minghu Jiang
Yongfeng Huang
AAMLWaLM
166
5
0
03 May 2024
Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in
  Multimodal Large Language Model Security
Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security
Yihe Fan
Yuxin Cao
Ziyu Zhao
Ziyao Liu
Shaofeng Li
93
15
0
08 Apr 2024
Backdoor Attack on Multilingual Machine Translation
Backdoor Attack on Multilingual Machine Translation
Jun Wang
Xingliang Yuan
Xuanli He
Benjamin I. P. Rubinstein
Trevor Cohn
66
6
0
03 Apr 2024
Threats, Attacks, and Defenses in Machine Unlearning: A Survey
Threats, Attacks, and Defenses in Machine Unlearning: A Survey
Ziyao Liu
Huanyi Ye
Chen Chen
Yongsen Zheng
K. Lam
AAMLMU
129
32
0
20 Mar 2024
Poisoning Programs by Un-Repairing Code: Security Concerns of
  AI-generated Code
Poisoning Programs by Un-Repairing Code: Security Concerns of AI-generated Code
Cristina Improta
SILMAAML
143
9
0
11 Mar 2024
Exploring Advanced Methodologies in Security Evaluation for LLMs
Exploring Advanced Methodologies in Security Evaluation for LLMs
Junming Huang
Jiawei Zhang
Qi Wang
Weihong Han
Yanchun Zhang
104
0
0
28 Feb 2024
Double-I Watermark: Protecting Model Copyright for LLM Fine-tuning
Double-I Watermark: Protecting Model Copyright for LLM Fine-tuning
Shen Li
Liuyi Yao
Jinyang Gao
Lan Zhang
Yaliang Li
123
13
0
22 Feb 2024
Defending Against Weight-Poisoning Backdoor Attacks for
  Parameter-Efficient Fine-Tuning
Defending Against Weight-Poisoning Backdoor Attacks for Parameter-Efficient Fine-Tuning
Shuai Zhao
Leilei Gan
Anh Tuan Luu
Jie Fu
Lingjuan Lyu
Meihuizi Jia
Jinming Wen
AAML
83
25
0
19 Feb 2024
Test-Time Backdoor Attacks on Multimodal Large Language Models
Test-Time Backdoor Attacks on Multimodal Large Language Models
Dong Lu
Tianyu Pang
Chao Du
Qian Liu
Xianjun Yang
Min Lin
AAML
165
26
0
13 Feb 2024
Punctuation Matters! Stealthy Backdoor Attack for Language Models
Punctuation Matters! Stealthy Backdoor Attack for Language Models
Xuan Sheng
Zhicheng Li
Zhaoyang Han
Xiangmao Chang
Piji Li
82
5
0
26 Dec 2023
Poisoned ChatGPT Finds Work for Idle Hands: Exploring Developers' Coding
  Practices with Insecure Suggestions from Poisoned AI Models
Poisoned ChatGPT Finds Work for Idle Hands: Exploring Developers' Coding Practices with Insecure Suggestions from Poisoned AI Models
Sanghak Oh
Kiho Lee
Seonhye Park
Doowon Kim
Hyoungshick Kim
SILM
53
19
0
11 Dec 2023
The Philosopher's Stone: Trojaning Plugins of Large Language Models
The Philosopher's Stone: Trojaning Plugins of Large Language Models
Tian Dong
Minhui Xue
Guoxing Chen
Rayne Holland
Shaofeng Li
Yan Meng
Zhen Liu
Haojin Zhu
AAML
154
14
0
01 Dec 2023
A Survey on Federated Unlearning: Challenges, Methods, and Future
  Directions
A Survey on Federated Unlearning: Challenges, Methods, and Future Directions
Ziyao Liu
Yu Jiang
Jiyuan Shen
Minyi Peng
Kwok-Yan Lam
Xingliang Yuan
Xiaoning Liu
MU
114
55
0
31 Oct 2023
Defending Our Privacy With Backdoors
Defending Our Privacy With Backdoors
Dominik Hintersdorf
Lukas Struppek
Daniel Neider
Kristian Kersting
SILMAAML
118
2
0
12 Oct 2023
Large Language Model Alignment: A Survey
Large Language Model Alignment: A Survey
Tianhao Shen
Renren Jin
Yufei Huang
Chuang Liu
Weilong Dong
Zishan Guo
Xinwei Wu
Yan Liu
Deyi Xiong
LM&MA
112
205
0
26 Sep 2023
RAI4IoE: Responsible AI for Enabling the Internet of Energy
RAI4IoE: Responsible AI for Enabling the Internet of Energy
Minhui Xue
Surya Nepal
Ling Liu
Subbu Sethuvenkatraman
Xingliang Yuan
Carsten Rudolph
Ruoxi Sun
Greg Eisenhauer
111
4
0
20 Sep 2023
Backdoor Attacks and Countermeasures in Natural Language Processing
  Models: A Comprehensive Security Review
Backdoor Attacks and Countermeasures in Natural Language Processing Models: A Comprehensive Security Review
Pengzhou Cheng
Zongru Wu
Wei Du
Haodong Zhao
Wei Lu
Gongshen Liu
SILMAAML
181
21
0
12 Sep 2023
A Comprehensive Overview of Backdoor Attacks in Large Language Models
  within Communication Networks
A Comprehensive Overview of Backdoor Attacks in Large Language Models within Communication Networks
Haomiao Yang
Kunlan Xiang
Mengyu Ge
Hongwei Li
Rongxing Lu
Shui Yu
SILM
69
46
0
28 Aug 2023
LMSanitator: Defending Prompt-Tuning Against Task-Agnostic Backdoors
LMSanitator: Defending Prompt-Tuning Against Task-Agnostic Backdoors
Chengkun Wei
Wenlong Meng
Zhikun Zhang
M. Chen
Ming-Hui Zhao
Wenjing Fang
Lei Wang
Zihui Zhang
Wenzhi Chen
AAML
63
11
0
26 Aug 2023
Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning
  Attacks
Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning Attacks
Domenico Cotroneo
Cristina Improta
Pietro Liguori
R. Natella
SILM
102
30
0
04 Aug 2023
ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned
  Samples in NLP
ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP
Lu Yan
Zhuo Zhang
Guanhong Tao
Kaiyuan Zhang
Xuan Chen
Guangyu Shen
Xiangyu Zhang
AAMLSILM
110
22
0
04 Aug 2023
From Prompt Injections to SQL Injection Attacks: How Protected is Your LLM-Integrated Web Application?
From Prompt Injections to SQL Injection Attacks: How Protected is Your LLM-Integrated Web Application?
Rodrigo Pedro
Daniel Castro
Paulo Carreira
Nuno Santos
SILMAAML
134
57
0
03 Aug 2023
LSF-IDM: Automotive Intrusion Detection Model with Lightweight
  Attribution and Semantic Fusion
LSF-IDM: Automotive Intrusion Detection Model with Lightweight Attribution and Semantic Fusion
Pengzhou Cheng
Lei Hua
Haobin Jiang
Gongshen Liu
49
1
0
02 Aug 2023
When Large Language Models Meet Personalization: Perspectives of
  Challenges and Opportunities
When Large Language Models Meet Personalization: Perspectives of Challenges and Opportunities
Jin Chen
Zheng Liu
Xunpeng Huang
Chenwang Wu
Qi Liu
...
Yuxuan Lei
Xiaolong Chen
Xingmei Wang
Defu Lian
Enhong Chen
ALM
92
129
0
31 Jul 2023
Backdoor Attacks for In-Context Learning with Language Models
Backdoor Attacks for In-Context Learning with Language Models
Nikhil Kandpal
Matthew Jagielski
Florian Tramèr
Nicholas Carlini
SILMAAML
118
84
0
27 Jul 2023
Efficient Backdoor Attacks for Deep Neural Networks in Real-world
  Scenarios
Efficient Backdoor Attacks for Deep Neural Networks in Real-world Scenarios
Ziqiang Li
Hong Sun
Pengfei Xia
Heng Li
Beihao Xia
Yi Wu
Bin Li
AAML
101
10
0
14 Jun 2023
UMD: Unsupervised Model Detection for X2X Backdoor Attacks
UMD: Unsupervised Model Detection for X2X Backdoor Attacks
Zhen Xiang
Zidi Xiong
Yue Liu
AAML
94
20
0
29 May 2023
Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting
  Jailbreaks
Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks
Abhinav Rao
S. Vashistha
Atharva Naik
Somak Aditya
Monojit Choudhury
120
24
0
24 May 2023
A Survey of Safety and Trustworthiness of Large Language Models through
  the Lens of Verification and Validation
A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation
Xiaowei Huang
Wenjie Ruan
Wei Huang
Gao Jin
Yizhen Dong
...
Sihao Wu
Peipei Xu
Dengyu Wu
André Freitas
Mustafa A. Mustafa
ALM
132
96
0
19 May 2023
Are You Copying My Model? Protecting the Copyright of Large Language
  Models for EaaS via Backdoor Watermark
Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark
Wenjun Peng
Jingwei Yi
Fangzhao Wu
Shangxi Wu
Bin Zhu
Lingjuan Lyu
Binxing Jiao
Tongye Xu
Guangzhong Sun
Xing Xie
WaLM
75
66
0
17 May 2023
Text-to-Image Diffusion Models can be Easily Backdoored through
  Multimodal Data Poisoning
Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning
Shengfang Zhai
Yinpeng Dong
Qingni Shen
Shih-Chieh Pu
Yuejian Fang
Hang Su
73
77
0
07 May 2023
ChatGPT as an Attack Tool: Stealthy Textual Backdoor Attack via Blackbox
  Generative Model Trigger
ChatGPT as an Attack Tool: Stealthy Textual Backdoor Attack via Blackbox Generative Model Trigger
Jiazhao Li
Yijin Yang
Zhuofeng Wu
V. Vydiswaran
Chaowei Xiao
SILM
197
46
0
27 Apr 2023
12
Next