Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.15543
Cited By
Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models
29 March 2021
Wenkai Yang
Lei Li
Zhiyuan Zhang
Xuancheng Ren
Xu Sun
Bin He
SILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models"
33 / 33 papers shown
Title
LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures
Francisco Aguilera-Martínez
Fernando Berzal
PILM
57
0
0
02 May 2025
The Ultimate Cookbook for Invisible Poison: Crafting Subtle Clean-Label Text Backdoors with Style Attributes
Wencong You
Daniel Lowd
39
0
0
24 Apr 2025
BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts
Qingyue Wang
Qi Pang
Xixun Lin
Shuai Wang
Daoyuan Wu
MoE
64
0
0
24 Apr 2025
Char-mander Use mBackdoor! A Study of Cross-lingual Backdoor Attacks in Multilingual LLMs
Himanshu Beniwal
Sailesh Panda
Birudugadda Srivibhav
Mayank Singh
45
0
0
24 Feb 2025
Is poisoning a real threat to LLM alignment? Maybe more so than you think
Pankayaraj Pathmanathan
Souradip Chakraborty
Xiangyu Liu
Yongyuan Liang
Furong Huang
AAML
52
14
0
17 Jun 2024
Two Heads are Better than One: Nested PoE for Robust Defense Against Multi-Backdoors
Victoria Graf
Qin Liu
Muhao Chen
AAML
40
8
0
02 Apr 2024
Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment
Jiong Wang
Jiazhao Li
Yiquan Li
Xiangyu Qi
Junjie Hu
Yixuan Li
P. McDaniel
Muhao Chen
Bo Li
Chaowei Xiao
AAML
SILM
40
18
0
22 Feb 2024
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents
Wenkai Yang
Xiaohan Bi
Yankai Lin
Sishuo Chen
Jie Zhou
Xu Sun
LLMAG
AAML
48
56
0
17 Feb 2024
Manipulating Predictions over Discrete Inputs in Machine Teaching
Xiaodong Wu
Yufei Han
H. Dahrouj
Jianbing Ni
Zhenwen Liang
Xiangliang Zhang
23
0
0
31 Jan 2024
Punctuation Matters! Stealthy Backdoor Attack for Language Models
Xuan Sheng
Zhicheng Li
Zhaoyang Han
Xiangmao Chang
Piji Li
43
3
0
26 Dec 2023
Universal Jailbreak Backdoors from Poisoned Human Feedback
Javier Rando
Florian Tramèr
26
63
0
24 Nov 2023
Efficient Trigger Word Insertion
Yueqi Zeng
Ziqiang Li
Pengfei Xia
Lei Liu
Bin Li
AAML
23
5
0
23 Nov 2023
A Comprehensive Overview of Backdoor Attacks in Large Language Models within Communication Networks
Haomiao Yang
Kunlan Xiang
Mengyu Ge
Hongwei Li
Rongxing Lu
Shui Yu
SILM
30
41
0
28 Aug 2023
From Shortcuts to Triggers: Backdoor Defense with Denoised PoE
Qin Liu
Fei Wang
Chaowei Xiao
Muhao Chen
AAML
37
22
0
24 May 2023
A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation
Xiaowei Huang
Wenjie Ruan
Wei Huang
Gao Jin
Yizhen Dong
...
Sihao Wu
Peipei Xu
Dengyu Wu
André Freitas
Mustafa A. Mustafa
ALM
52
83
0
19 May 2023
Diffusion Theory as a Scalpel: Detecting and Purifying Poisonous Dimensions in Pre-trained Language Models Caused by Backdoor or Bias
Zhiyuan Zhang
Deli Chen
Hao Zhou
Fandong Meng
Jie Zhou
Xu Sun
36
5
0
08 May 2023
UNICORN: A Unified Backdoor Trigger Inversion Framework
Zhenting Wang
Kai Mei
Juan Zhai
Shiqing Ma
LLMSV
39
44
0
05 Apr 2023
Backdoor Learning for NLP: Recent Advances, Challenges, and Future Research Directions
Marwan Omar
SILM
AAML
37
20
0
14 Feb 2023
Stealthy Backdoor Attack for Code Models
Zhou Yang
Bowen Xu
Jie M. Zhang
Hong Jin Kang
Jieke Shi
Junda He
David Lo
AAML
26
65
0
06 Jan 2023
UPTON: Preventing Authorship Leakage from Public Text Release via Data Poisoning
Ziyao Wang
Thai Le
Dongwon Lee
36
1
0
17 Nov 2022
BadRes: Reveal the Backdoors through Residual Connection
Min He
Tianyu Chen
Haoyi Zhou
Shanghang Zhang
Jianxin Li
24
0
0
15 Sep 2022
Catch Me If You Can: Deceiving Stance Detection and Geotagging Models to Protect Privacy of Individuals on Twitter
Dilara Doğan
Bahadir Altun
Muhammed Said Zengin
Mucahid Kutlu
Tamer Elsayed
26
2
0
23 Jul 2022
Backdoor Attacks in Federated Learning by Rare Embeddings and Gradient Ensembling
Kiyoon Yoo
Nojun Kwak
SILM
AAML
FedML
25
19
0
29 Apr 2022
Constrained Optimization with Dynamic Bound-scaling for Effective NLPBackdoor Defense
Guangyu Shen
Yingqi Liu
Guanhong Tao
Qiuling Xu
Zhuo Zhang
Shengwei An
Shiqing Ma
Xinming Zhang
AAML
23
34
0
11 Feb 2022
Spinning Language Models: Risks of Propaganda-As-A-Service and Countermeasures
Eugene Bagdasaryan
Vitaly Shmatikov
SILM
AAML
33
78
0
09 Dec 2021
A General Framework for Defending Against Backdoor Attacks via Influence Graph
Xiaofei Sun
Jiwei Li
Xiaoya Li
Ziyao Wang
Tianwei Zhang
Han Qiu
Fei Wu
Chun Fan
AAML
TDI
24
5
0
29 Nov 2021
Triggerless Backdoor Attack for NLP Tasks with Clean Labels
Leilei Gan
Jiwei Li
Tianwei Zhang
Xiaoya Li
Yuxian Meng
Fei Wu
Yi Yang
Shangwei Guo
Chun Fan
AAML
SILM
27
74
0
15 Nov 2021
RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models
Wenkai Yang
Yankai Lin
Peng Li
Jie Zhou
Xu Sun
SILM
AAML
34
103
0
15 Oct 2021
BadPre: Task-agnostic Backdoor Attacks to Pre-trained NLP Foundation Models
Kangjie Chen
Yuxian Meng
Xiaofei Sun
Shangwei Guo
Tianwei Zhang
Jiwei Li
Chun Fan
SILM
34
106
0
06 Oct 2021
How to Inject Backdoors with Better Consistency: Logit Anchoring on Clean Data
Zhiyuan Zhang
Lingjuan Lyu
Weiqiang Wang
Lichao Sun
Xu Sun
21
35
0
03 Sep 2021
Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning
Linyang Li
Demin Song
Xiaonan Li
Jiehang Zeng
Ruotian Ma
Xipeng Qiu
33
135
0
31 Aug 2021
Defending Against Backdoor Attacks in Natural Language Generation
Xiaofei Sun
Xiaoya Li
Yuxian Meng
Xiang Ao
Fei Wu
Jiwei Li
Tianwei Zhang
AAML
SILM
31
47
0
03 Jun 2021
Backdoor Learning: A Survey
Yiming Li
Yong Jiang
Zhifeng Li
Shutao Xia
AAML
45
592
0
17 Jul 2020
1