ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2011.10369
  4. Cited By
ONION: A Simple and Effective Defense Against Textual Backdoor Attacks

ONION: A Simple and Effective Defense Against Textual Backdoor Attacks

20 November 2020
Fanchao Qi
Yangyi Chen
Mukai Li
Yuan Yao
Zhiyuan Liu
Maosong Sun
    AAML
ArXivPDFHTML

Papers citing "ONION: A Simple and Effective Defense Against Textual Backdoor Attacks"

50 / 167 papers shown
Title
A Survey of Attacks on Large Language Models
A Survey of Attacks on Large Language Models
Wenrui Xu
Keshab K. Parhi
AAML
ELM
14
0
0
18 May 2025
PeerGuard: Defending Multi-Agent Systems Against Backdoor Attacks Through Mutual Reasoning
PeerGuard: Defending Multi-Agent Systems Against Backdoor Attacks Through Mutual Reasoning
Falong Fan
Xi Li
LLMAG
AAML
5
0
0
16 May 2025
A Chaos Driven Metric for Backdoor Attack Detection
A Chaos Driven Metric for Backdoor Attack Detection
Hema Karnam Surendrababu
Nithin Nagaraj
AAML
41
0
0
06 May 2025
BadLingual: A Novel Lingual-Backdoor Attack against Large Language Models
BadLingual: A Novel Lingual-Backdoor Attack against Large Language Models
Zihan Wang
Hongwei Li
Rui Zhang
Wenbo Jiang
Kangjie Chen
Tianwei Zhang
Qingchuan Zhao
Jiawei Li
AAML
46
0
0
06 May 2025
BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts
BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts
Qingyue Wang
Qi Pang
Xixun Lin
Shuai Wang
Daoyuan Wu
MoE
62
0
0
24 Apr 2025
The Ultimate Cookbook for Invisible Poison: Crafting Subtle Clean-Label Text Backdoors with Style Attributes
The Ultimate Cookbook for Invisible Poison: Crafting Subtle Clean-Label Text Backdoors with Style Attributes
Wencong You
Daniel Lowd
39
0
0
24 Apr 2025
Never Start from Scratch: Expediting On-Device LLM Personalization via Explainable Model Selection
Never Start from Scratch: Expediting On-Device LLM Personalization via Explainable Model Selection
Haoming Wang
Boyuan Yang
Xiangyu Yin
Wei Gao
33
0
0
15 Apr 2025
Exploring Backdoor Attack and Defense for LLM-empowered Recommendations
Exploring Backdoor Attack and Defense for LLM-empowered Recommendations
Liangbo Ning
Wenqi Fan
Qing Li
AAML
SILM
53
0
0
15 Apr 2025
Propaganda via AI? A Study on Semantic Backdoors in Large Language Models
Propaganda via AI? A Study on Semantic Backdoors in Large Language Models
Nay Myat Min
Long H. Pham
Yige Li
Jun Sun
AAML
33
0
0
15 Apr 2025
NLP Security and Ethics, in the Wild
NLP Security and Ethics, in the Wild
Heather Lent
Erick Galinkin
Yiyi Chen
Jens Myrup Pedersen
Leon Derczynski
Johannes Bjerva
SILM
47
0
0
09 Apr 2025
ShadowCoT: Cognitive Hijacking for Stealthy Reasoning Backdoors in LLMs
ShadowCoT: Cognitive Hijacking for Stealthy Reasoning Backdoors in LLMs
Gejian Zhao
Hanzhou Wu
Xinpeng Zhang
Athanasios V. Vasilakos
LRM
38
1
0
08 Apr 2025
Defending Deep Neural Networks against Backdoor Attacks via Module Switching
Defending Deep Neural Networks against Backdoor Attacks via Module Switching
Weijun Li
Ansh Arora
Xuanli He
Mark Dras
Qiongkai Xu
AAML
MoMe
53
0
0
08 Apr 2025
The H-Elena Trojan Virus to Infect Model Weights: A Wake-Up Call on the Security Risks of Malicious Fine-Tuning
The H-Elena Trojan Virus to Infect Model Weights: A Wake-Up Call on the Security Risks of Malicious Fine-Tuning
Virilo Tejedor
Cristina Zuheros
Carlos Peláez-González
David Herrera-Poyatos
Andrés Herrera-Poyatos
F. Herrera
29
0
0
04 Apr 2025
Exposing the Ghost in the Transformer: Abnormal Detection for Large Language Models via Hidden State Forensics
Exposing the Ghost in the Transformer: Abnormal Detection for Large Language Models via Hidden State Forensics
Shide Zhou
Kaidi Wang
Ling Shi
Han Wang
47
0
0
01 Apr 2025
NaviDet: Efficient Input-level Backdoor Detection on Text-to-Image Synthesis via Neuron Activation Variation
Shengfang Zhai
Jiajun Li
Yue Liu
Huanran Chen
Zhihua Tian
Wenjie Qu
Qingni Shen
Ruoxi Jia
Yinpeng Dong
Jiaheng Zhang
AAML
49
0
0
09 Mar 2025
BadJudge: Backdoor Vulnerabilities of LLM-as-a-Judge
Terry Tong
Fei Wang
Zhe Zhao
Mengzhao Chen
AAML
ELM
37
1
0
01 Mar 2025
Beyond Natural Language Perplexity: Detecting Dead Code Poisoning in Code Generation Datasets
Beyond Natural Language Perplexity: Detecting Dead Code Poisoning in Code Generation Datasets
Chichien Tsai
Chiamu Yu
Yingdar Lin
Yusung Wu
Weibin Lee
AAML
53
0
0
27 Feb 2025
Show Me Your Code! Kill Code Poisoning: A Lightweight Method Based on Code Naturalness
Show Me Your Code! Kill Code Poisoning: A Lightweight Method Based on Code Naturalness
Weisong Sun
Yuchen Chen
Mengzhe Yuan
Chunrong Fang
Zhenpeng Chen
Chong Wang
Yang Liu
Baowen Xu
Zhenyu Chen
AAML
36
1
0
20 Feb 2025
Poisoned Source Code Detection in Code Models
Poisoned Source Code Detection in Code Models
Ehab Ghannoum
Mohammad Ghafari
AAML
65
0
0
19 Feb 2025
UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models
UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models
Huawei Lin
Yingjie Lao
Tong Geng
Tan Yu
Weijie Zhao
AAML
SILM
79
2
0
18 Feb 2025
To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Models
To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Models
Zihao Zhu
Hongbao Zhang
Ruotong Wang
Ke Xu
Siwei Lyu
Baoyuan Wu
AAML
LRM
67
5
0
16 Feb 2025
Cut the Deadwood Out: Post-Training Model Purification with Selective Module Substitution
Cut the Deadwood Out: Post-Training Model Purification with Selective Module Substitution
Yao Tong
Weijun Li
Xuanli He
Haolan Zhan
Qiongkai Xu
AAML
40
1
0
31 Dec 2024
Double Landmines: Invisible Textual Backdoor Attacks based on
  Dual-Trigger
Double Landmines: Invisible Textual Backdoor Attacks based on Dual-Trigger
Yang Hou
Qiuling Yue
Lujia Chai
Guozhao Liao
Wenbao Han
Wei Ou
40
0
0
23 Dec 2024
Gracefully Filtering Backdoor Samples for Generative Large Language
  Models without Retraining
Gracefully Filtering Backdoor Samples for Generative Large Language Models without Retraining
Zongru Wu
Pengzhou Cheng
Lingyong Fang
Zhuosheng Zhang
Gongshen Liu
AAML
SILM
80
0
0
03 Dec 2024
Neutralizing Backdoors through Information Conflicts for Large Language
  Models
Neutralizing Backdoors through Information Conflicts for Large Language Models
Chen Chen
Yuchen Sun
Xueluan Gong
Jiaxin Gao
K. Lam
KELM
AAML
77
0
0
27 Nov 2024
TrojanRobot: Physical-World Backdoor Attacks Against VLM-based Robotic Manipulation
Xiaobei Wang
Hewen Pan
Hangtao Zhang
Minghui Li
Shengshan Hu
...
Peijin Guo
Yichen Wang
Wei Wan
Aishan Liu
L. Zhang
AAML
88
4
0
18 Nov 2024
CROW: Eliminating Backdoors from Large Language Models via Internal
  Consistency Regularization
CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization
Nay Myat Min
Long H. Pham
Yige Li
Jun Sun
AAML
69
4
0
18 Nov 2024
BackdoorMBTI: A Backdoor Learning Multimodal Benchmark Tool Kit for Backdoor Defense Evaluation
Haiyang Yu
Tian Xie
Jiaping Gui
Pengyang Wang
P. Yi
Yue Wu
56
1
0
17 Nov 2024
CodePurify: Defend Backdoor Attacks on Neural Code Models via
  Entropy-based Purification
CodePurify: Defend Backdoor Attacks on Neural Code Models via Entropy-based Purification
Fangwen Mu
Junjie Wang
Zhuohao Yu
Lin Shi
Song Wang
Mingyang Li
Qing Wang
AAML
41
1
0
26 Oct 2024
Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via
  Exposed Models
Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models
Yige Li
Hanxun Huang
Jiaming Zhang
Xingjun Ma
Yu-Gang Jiang
AAML
37
2
0
25 Oct 2024
LLMScan: Causal Scan for LLM Misbehavior Detection
LLMScan: Causal Scan for LLM Misbehavior Detection
Mengdi Zhang
Kai Kiat Goh
Peixin Zhang
Jun Sun
Rose Lin Xin
Hongyu Zhang
25
0
0
22 Oct 2024
AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment
AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment
Pankayaraj Pathmanathan
Udari Madhushani Sehwag
Michael-Andrei Panaitescu-Liess
Furong Huang
SILM
AAML
43
0
0
15 Oct 2024
ASPIRER: Bypassing System Prompts With Permutation-based Backdoors in
  LLMs
ASPIRER: Bypassing System Prompts With Permutation-based Backdoors in LLMs
Lu Yan
Siyuan Cheng
Xuan Chen
Kaiyuan Zhang
Guangyu Shen
Zhuo Zhang
Xiangyu Zhang
AAML
SILM
23
0
0
05 Oct 2024
Demonstration Attack against In-Context Learning for Code Intelligence
Demonstration Attack against In-Context Learning for Code Intelligence
Yifei Ge
Weisong Sun
Yihang Lou
Chunrong Fang
Yiran Zhang
Yiming Li
Xiaofang Zhang
Yang Liu
Zhihong Zhao
Zhenyu Chen
AAML
28
1
0
03 Oct 2024
BadCM: Invisible Backdoor Attack Against Cross-Modal Learning
BadCM: Invisible Backdoor Attack Against Cross-Modal Learning
Zheng Zhang
Xu Yuan
Lei Zhu
Jingkuan Song
Liqiang Nie
AAML
48
11
0
03 Oct 2024
Mitigating Backdoor Threats to Large Language Models: Advancement and
  Challenges
Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges
Qin Liu
Wenjie Mo
Terry Tong
Lyne Tchapmi
Fei Wang
Chaowei Xiao
Muhao Chen
AAML
39
4
0
30 Sep 2024
Data-centric NLP Backdoor Defense from the Lens of Memorization
Data-centric NLP Backdoor Defense from the Lens of Memorization
Zhenting Wang
Zhizhi Wang
Mingyu Jin
Mengnan Du
Juan Zhai
Shiqing Ma
33
3
0
21 Sep 2024
Obliviate: Neutralizing Task-agnostic Backdoors within the
  Parameter-efficient Fine-tuning Paradigm
Obliviate: Neutralizing Task-agnostic Backdoors within the Parameter-efficient Fine-tuning Paradigm
Jaehan Kim
Minkyoo Song
S. Na
Seungwon Shin
AAML
41
0
0
21 Sep 2024
Exploiting the Vulnerability of Large Language Models via Defense-Aware
  Architectural Backdoor
Exploiting the Vulnerability of Large Language Models via Defense-Aware Architectural Backdoor
Abdullah Arafat Miah
Yu Bi
AAML
SILM
37
0
0
03 Sep 2024
CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models
CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models
Rui Zeng
Xi Chen
Yuwen Pu
Xuhong Zhang
Tianyu Du
Shouling Ji
41
2
0
02 Sep 2024
The Dark Side of Human Feedback: Poisoning Large Language Models via
  User Inputs
The Dark Side of Human Feedback: Poisoning Large Language Models via User Inputs
Bocheng Chen
Hanqing Guo
Guangjing Wang
Yuanda Wang
Qiben Yan
AAML
37
4
0
01 Sep 2024
DAMe: Personalized Federated Social Event Detection with Dual
  Aggregation Mechanism
DAMe: Personalized Federated Social Event Detection with Dual Aggregation Mechanism
Xiaoyan Yu
Yifan Wei
Pu Li
Shuaishuai Zhou
Hao Peng
Li Sun
Liehuang Zhu
Philip S. Yu
FedML
29
1
0
01 Sep 2024
Rethinking Backdoor Detection Evaluation for Language Models
Rethinking Backdoor Detection Evaluation for Language Models
Jun Yan
Wenjie Jacky Mo
Xiang Ren
Robin Jia
ELM
54
1
0
31 Aug 2024
Large Language Models are Good Attackers: Efficient and Stealthy Textual
  Backdoor Attacks
Large Language Models are Good Attackers: Efficient and Stealthy Textual Backdoor Attacks
Ziqiang Li
Yueqi Zeng
Pengfei Xia
Lei Liu
Zhangjie Fu
Bin Li
SILM
AAML
55
2
0
21 Aug 2024
FDI: Attack Neural Code Generation Systems through User Feedback Channel
FDI: Attack Neural Code Generation Systems through User Feedback Channel
Zhensu Sun
Xiaoning Du
Xiapu Luo
Fu Song
David Lo
Li Li
AAML
33
3
0
08 Aug 2024
Compromising Embodied Agents with Contextual Backdoor Attacks
Compromising Embodied Agents with Contextual Backdoor Attacks
Aishan Liu
Yuguang Zhou
Xianglong Liu
Tianyuan Zhang
Siyuan Liang
...
Tianlin Li
Junqi Zhang
Wenbo Zhou
Qing Guo
Dacheng Tao
LLMAG
AAML
44
8
0
06 Aug 2024
Can LLMs be Fooled? Investigating Vulnerabilities in LLMs
Can LLMs be Fooled? Investigating Vulnerabilities in LLMs
Sara Abdali
Jia He
C. Barberan
Richard Anarfi
36
7
0
30 Jul 2024
Know Your Limits: A Survey of Abstention in Large Language Models
Know Your Limits: A Survey of Abstention in Large Language Models
Bingbing Wen
Jihan Yao
Shangbin Feng
Chenjun Xu
Yulia Tsvetkov
Bill Howe
Lucy Lu Wang
59
11
0
25 Jul 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models
  (LLMs)
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
57
10
0
20 Jul 2024
Turning Generative Models Degenerate: The Power of Data Poisoning
  Attacks
Turning Generative Models Degenerate: The Power of Data Poisoning Attacks
Shuli Jiang
S. Kadhe
Yi Zhou
Farhan Ahmed
Ling Cai
Nathalie Baracaldo
SILM
AAML
41
4
0
17 Jul 2024
1234
Next