ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.06660
  4. Cited By
Weight Poisoning Attacks on Pre-trained Models

Weight Poisoning Attacks on Pre-trained Models

14 April 2020
Keita Kurita
Paul Michel
Graham Neubig
    AAML
    SILM
ArXivPDFHTML

Papers citing "Weight Poisoning Attacks on Pre-trained Models"

50 / 97 papers shown
Title
Threat Modeling for AI: The Case for an Asset-Centric Approach
Threat Modeling for AI: The Case for an Asset-Centric Approach
Jose Sanchez Vicarte
Marcin Spoczynski
Mostafa Elsaid
29
0
0
08 May 2025
Adversarial Attacks in Multimodal Systems: A Practitioner's Survey
Adversarial Attacks in Multimodal Systems: A Practitioner's Survey
Shashank Kapoor
Sanjay Surendranath Girija
Lakshit Arora
Dipen Pradhan
Ankit Shetgaonkar
Aman Raj
AAML
77
0
0
06 May 2025
A Survey on Privacy Risks and Protection in Large Language Models
A Survey on Privacy Risks and Protection in Large Language Models
Kang Chen
Xiuze Zhou
Yuanguo Lin
Shibo Feng
Li Shen
Pengcheng Wu
AILaw
PILM
171
0
0
04 May 2025
Backdoor Attacks Against Patch-based Mixture of Experts
Backdoor Attacks Against Patch-based Mixture of Experts
Cedric Chan
Jona te Lintelo
S. Picek
AAML
MoE
172
0
0
03 May 2025
GaussTrap: Stealthy Poisoning Attacks on 3D Gaussian Splatting for Targeted Scene Confusion
GaussTrap: Stealthy Poisoning Attacks on 3D Gaussian Splatting for Targeted Scene Confusion
Jiaxin Hong
Sixu Chen
Shuoyang Sun
Hongyao Yu
Hao Fang
Yuqi Tan
Bin Chen
Shuhan Qi
Jiawei Li
3DGS
AAML
153
0
0
29 Apr 2025
BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts
BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts
Qingyue Wang
Qi Pang
Xixun Lin
Shuai Wang
Daoyuan Wu
MoE
62
0
0
24 Apr 2025
The Ultimate Cookbook for Invisible Poison: Crafting Subtle Clean-Label Text Backdoors with Style Attributes
The Ultimate Cookbook for Invisible Poison: Crafting Subtle Clean-Label Text Backdoors with Style Attributes
Wencong You
Daniel Lowd
39
0
0
24 Apr 2025
PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization
PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization
Yang Jiao
Xiao Wang
Kai Yang
AAML
SILM
35
0
0
10 Apr 2025
Revisiting Backdoor Attacks on Time Series Classification in the Frequency Domain
Revisiting Backdoor Attacks on Time Series Classification in the Frequency Domain
Yuanmin Huang
Mi Zhang
Zhaoxiang Wang
Wenxuan Li
Min Yang
AAML
AI4TS
59
0
0
12 Mar 2025
Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images
Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images
Yubo Wang
Jianting Tang
Chaohu Liu
Linli Xu
AAML
63
1
0
23 Feb 2025
When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations
When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations
Huaizhi Ge
Yiming Li
Qifan Wang
Yongfeng Zhang
Ruixiang Tang
AAML
SILM
86
0
0
19 Nov 2024
LLMScan: Causal Scan for LLM Misbehavior Detection
LLMScan: Causal Scan for LLM Misbehavior Detection
Mengdi Zhang
Kai Kiat Goh
Peixin Zhang
Jun Sun
Rose Lin Xin
Hongyu Zhang
25
0
0
22 Oct 2024
NoiseAttack: An Evasive Sample-Specific Multi-Targeted Backdoor Attack
  Through White Gaussian Noise
NoiseAttack: An Evasive Sample-Specific Multi-Targeted Backdoor Attack Through White Gaussian Noise
Abdullah Arafat Miah
Kaan Icer
Resit Sendag
Yu Bi
AAML
DiffM
36
1
0
03 Sep 2024
Defending Code Language Models against Backdoor Attacks with Deceptive Cross-Entropy Loss
Defending Code Language Models against Backdoor Attacks with Deceptive Cross-Entropy Loss
Guang Yang
Yu Zhou
Xiang Chen
Xiangyu Zhang
Terry Yue Zhuo
David Lo
Taolue Chen
AAML
57
4
0
12 Jul 2024
Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data
Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data
Tim Baumgärtner
Yang Gao
Dana Alon
Donald Metzler
AAML
30
18
0
08 Apr 2024
Two Heads are Better than One: Nested PoE for Robust Defense Against
  Multi-Backdoors
Two Heads are Better than One: Nested PoE for Robust Defense Against Multi-Backdoors
Victoria Graf
Qin Liu
Muhao Chen
AAML
37
8
0
02 Apr 2024
Acquiring Clean Language Models from Backdoor Poisoned Datasets by
  Downscaling Frequency Space
Acquiring Clean Language Models from Backdoor Poisoned Datasets by Downscaling Frequency Space
Zongru Wu
Zhuosheng Zhang
Pengzhou Cheng
Gongshen Liu
AAML
44
4
0
19 Feb 2024
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based
  Agents
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents
Wenkai Yang
Xiaohan Bi
Yankai Lin
Sishuo Chen
Jie Zhou
Xu Sun
LLMAG
AAML
44
53
0
17 Feb 2024
Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models
Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models
Weijiao Zhang
Jindong Han
Zhao Xu
Hang Ni
Hao Liu
Hui Xiong
Hui Xiong
AI4CE
77
15
0
30 Jan 2024
Punctuation Matters! Stealthy Backdoor Attack for Language Models
Punctuation Matters! Stealthy Backdoor Attack for Language Models
Xuan Sheng
Zhicheng Li
Zhaoyang Han
Xiangmao Chang
Piji Li
40
3
0
26 Dec 2023
Universal Jailbreak Backdoors from Poisoned Human Feedback
Universal Jailbreak Backdoors from Poisoned Human Feedback
Javier Rando
Florian Tramèr
23
60
0
24 Nov 2023
Efficient Trigger Word Insertion
Efficient Trigger Word Insertion
Yueqi Zeng
Ziqiang Li
Pengfei Xia
Lei Liu
Bin Li
AAML
21
5
0
23 Nov 2023
RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with
  Human Feedback in Large Language Models
RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with Human Feedback in Large Language Models
Jiong Wang
Junlin Wu
Muhao Chen
Yevgeniy Vorobeychik
Chaowei Xiao
AAML
29
12
0
16 Nov 2023
Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations
Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations
Wenjie Mo
Lyne Tchapmi
Qin Liu
Jiong Wang
Jun Yan
Chaowei Xiao
Muhao Chen
Muhao Chen
AAML
61
17
0
16 Nov 2023
Beyond Detection: Unveiling Fairness Vulnerabilities in Abusive Language
  Models
Beyond Detection: Unveiling Fairness Vulnerabilities in Abusive Language Models
Yueqing Liang
Lu Cheng
Ali Payani
Kai Shu
28
3
0
15 Nov 2023
Setting the Trap: Capturing and Defeating Backdoors in Pretrained
  Language Models through Honeypots
Setting the Trap: Capturing and Defeating Backdoors in Pretrained Language Models through Honeypots
Ruixiang Tang
Jiayi Yuan
Yiming Li
Zirui Liu
Rui Chen
Xia Hu
AAML
36
13
0
28 Oct 2023
Everyone Can Attack: Repurpose Lossy Compression as a Natural Backdoor
  Attack
Everyone Can Attack: Repurpose Lossy Compression as a Natural Backdoor Attack
Sze Jue Yang
Q. Nguyen
Chee Seng Chan
Khoa D. Doan
AAML
DiffM
32
0
0
31 Aug 2023
Adversarial Clean Label Backdoor Attacks and Defenses on Text
  Classification Systems
Adversarial Clean Label Backdoor Attacks and Defenses on Text Classification Systems
Ashim Gupta
Amrith Krishna
AAML
22
16
0
31 May 2023
NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models
NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models
Kai Mei
Zheng Li
Zhenting Wang
Yang Zhang
Shiqing Ma
AAML
SILM
37
48
0
28 May 2023
Spoofing Attacker Also Benefits from Self-Supervised Pretrained Model
Spoofing Attacker Also Benefits from Self-Supervised Pretrained Model
Aoi Ito
Shota Horiguchi
SSL
27
2
0
24 May 2023
From Shortcuts to Triggers: Backdoor Defense with Denoised PoE
From Shortcuts to Triggers: Backdoor Defense with Denoised PoE
Qin Liu
Fei Wang
Chaowei Xiao
Muhao Chen
AAML
37
21
0
24 May 2023
Mitigating Backdoor Poisoning Attacks through the Lens of Spurious
  Correlation
Mitigating Backdoor Poisoning Attacks through the Lens of Spurious Correlation
Xuanli He
Qiongkai Xu
Jun Wang
Benjamin I. P. Rubinstein
Trevor Cohn
AAML
32
18
0
19 May 2023
A Survey of Safety and Trustworthiness of Large Language Models through
  the Lens of Verification and Validation
A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation
Xiaowei Huang
Wenjie Ruan
Wei Huang
Gao Jin
Yizhen Dong
...
Sihao Wu
Peipei Xu
Dengyu Wu
André Freitas
Mustafa A. Mustafa
ALM
45
82
0
19 May 2023
Diffusion Theory as a Scalpel: Detecting and Purifying Poisonous
  Dimensions in Pre-trained Language Models Caused by Backdoor or Bias
Diffusion Theory as a Scalpel: Detecting and Purifying Poisonous Dimensions in Pre-trained Language Models Caused by Backdoor or Bias
Zhiyuan Zhang
Deli Chen
Hao Zhou
Fandong Meng
Jie Zhou
Xu Sun
36
5
0
08 May 2023
Text-to-Image Diffusion Models can be Easily Backdoored through
  Multimodal Data Poisoning
Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning
Shengfang Zhai
Yinpeng Dong
Qingni Shen
Shih-Chieh Pu
Yuejian Fang
Hang Su
32
71
0
07 May 2023
Backdoor Learning on Sequence to Sequence Models
Backdoor Learning on Sequence to Sequence Models
Lichang Chen
Minhao Cheng
Heng-Chiao Huang
SILM
54
18
0
03 May 2023
UNICORN: A Unified Backdoor Trigger Inversion Framework
UNICORN: A Unified Backdoor Trigger Inversion Framework
Zhenting Wang
Kai Mei
Juan Zhai
Shiqing Ma
LLMSV
35
44
0
05 Apr 2023
An Empirical Study of Pre-Trained Model Reuse in the Hugging Face Deep
  Learning Model Registry
An Empirical Study of Pre-Trained Model Reuse in the Hugging Face Deep Learning Model Registry
Wenxin Jiang
Nicholas Synovic
Matt Hyatt
Taylor R. Schorlemmer
R. Sethi
Yung-Hsiang Lu
George K. Thiruvathukal
James C. Davis
30
64
0
05 Mar 2023
Backdoor Attacks to Pre-trained Unified Foundation Models
Backdoor Attacks to Pre-trained Unified Foundation Models
Zenghui Yuan
Yixin Liu
Kai Zhang
Pan Zhou
Lichao Sun
AAML
32
10
0
18 Feb 2023
Backdoor Learning for NLP: Recent Advances, Challenges, and Future
  Research Directions
Backdoor Learning for NLP: Recent Advances, Challenges, and Future Research Directions
Marwan Omar
SILM
AAML
33
20
0
14 Feb 2023
Explainable AI does not provide the explanations end-users are asking
  for
Explainable AI does not provide the explanations end-users are asking for
Savio Rozario
G. Cevora
XAI
20
0
0
25 Jan 2023
BDMMT: Backdoor Sample Detection for Language Models through Model
  Mutation Testing
BDMMT: Backdoor Sample Detection for Language Models through Model Mutation Testing
Jiali Wei
Ming Fan
Wenjing Jiao
Wuxia Jin
Ting Liu
AAML
29
10
0
25 Jan 2023
Circumventing interpretability: How to defeat mind-readers
Circumventing interpretability: How to defeat mind-readers
Lee D. Sharkey
35
3
0
21 Dec 2022
Despite "super-human" performance, current LLMs are unsuited for
  decisions about ethics and safety
Despite "super-human" performance, current LLMs are unsuited for decisions about ethics and safety
Joshua Albrecht
Ellie Kitanidis
Abraham J. Fetterman
ELM
ReLM
ALM
LRM
27
17
0
13 Dec 2022
UPTON: Preventing Authorship Leakage from Public Text Release via Data
  Poisoning
UPTON: Preventing Authorship Leakage from Public Text Release via Data Poisoning
Ziyao Wang
Thai Le
Dongwon Lee
36
1
0
17 Nov 2022
MSDT: Masked Language Model Scoring Defense in Text Domain
MSDT: Masked Language Model Scoring Defense in Text Domain
Jaechul Roh
Minhao Cheng
Yajun Fang
AAML
15
1
0
10 Nov 2022
Rickrolling the Artist: Injecting Backdoors into Text Encoders for
  Text-to-Image Synthesis
Rickrolling the Artist: Injecting Backdoors into Text Encoders for Text-to-Image Synthesis
Lukas Struppek
Dominik Hintersdorf
Kristian Kersting
SILM
22
36
0
04 Nov 2022
Dormant Neural Trojans
Dormant Neural Trojans
Feisi Fu
Panagiota Kiourti
Wenchao Li
AAML
30
0
0
02 Nov 2022
Poison Attack and Defense on Deep Source Code Processing Models
Poison Attack and Defense on Deep Source Code Processing Models
Jia Li
Zhuo Li
Huangzhao Zhang
Ge Li
Zhi Jin
Xing Hu
Xin Xia
AAML
40
16
0
31 Oct 2022
Apple of Sodom: Hidden Backdoors in Superior Sentence Embeddings via
  Contrastive Learning
Apple of Sodom: Hidden Backdoors in Superior Sentence Embeddings via Contrastive Learning
Xiaoyi Chen
Baisong Xin
Shengfang Zhai
Shiqing Ma
Qingni Shen
Zhonghai Wu
SILM
19
2
0
20 Oct 2022
12
Next