Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.09179
Cited By
Instruction Backdoor Attacks Against Customized LLMs
14 February 2024
Rui Zhang
Hongwei Li
Rui Wen
Wenbo Jiang
Yuan Zhang
Michael Backes
Yun Shen
Yang Zhang
AAML
SILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Instruction Backdoor Attacks Against Customized LLMs"
24 / 24 papers shown
Title
Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents
Pengzhou Cheng
Haowen Hu
Zheng Wu
Zongru Wu
Tianjie Ju
Zhuosheng Zhang
Zhuosheng Zhang
LLMAG
AAML
75
0
0
20 May 2025
From Assistants to Adversaries: Exploring the Security Risks of Mobile LLM Agents
Liangxuan Wu
Chao Wang
Tianming Liu
Yanjie Zhao
Haoyu Wang
AAML
53
0
0
19 May 2025
Backdoor Attacks Against Patch-based Mixture of Experts
Cedric Chan
Jona te Lintelo
S. Picek
AAML
MoE
368
0
0
03 May 2025
PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization
Yang Jiao
Xiao Wang
Kai Yang
AAML
SILM
88
0
0
10 Apr 2025
Neural Honeytrace: A Robust Plug-and-Play Watermarking Framework against Model Extraction Attacks
Yixiao Xu
Binxing Fang
Rui Wang
Yinghai Zhou
S. Ji
Yuan Liu
Mohan Li
AAML
MIACV
115
0
0
16 Jan 2025
Attention Tracker: Detecting Prompt Injection Attacks in LLMs
Kuo-Han Hung
Ching-Yun Ko
Ambrish Rawat
I-Hsin Chung
Winston H. Hsu
Pin-Yu Chen
99
8
0
01 Nov 2024
Large Language Models for Blockchain Security: A Systematic Literature Review
Zheyuan He
Zihao Li
Sen Yang
Ao Qiao
Xiaosong Zhang
Xiapu Luo
Ting Chen
Ting Chen
PILM
77
16
0
21 Mar 2024
Mixtral of Experts
Albert Q. Jiang
Alexandre Sablayrolles
Antoine Roux
A. Mensch
Blanche Savary
...
Théophile Gervet
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoE
LLMAG
151
1,083
0
08 Jan 2024
Hijacking Large Language Models via Adversarial In-Context Learning
Yao Qiang
Xiangyu Zhou
Saleh Zare Zade
Prashant Khanduri
Dongxiao Zhu
91
35
0
16 Nov 2023
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
291
1,451
0
27 Jul 2023
ProPILE: Probing Privacy Leakage in Large Language Models
Siwon Kim
Sangdoo Yun
Hwaran Lee
Martin Gubri
Sungroh Yoon
Seong Joon Oh
PILM
460
105
3
04 Jul 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
871
12,916
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
799
9,351
0
28 Jan 2022
Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey
Bonan Min
Hayley L Ross
Elior Sulem
Amir Pouran Ben Veyseh
Thien Huu Nguyen
Oscar Sainz
Eneko Agirre
Ilana Heinz
Dan Roth
LM&MA
VLM
AI4CE
132
1,074
0
01 Nov 2021
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
274
4,397
0
27 Oct 2021
Anti-Backdoor Learning: Training Clean Models on Poisoned Data
Yige Li
X. Lyu
Nodens Koren
Lingjuan Lyu
Yue Liu
Xingjun Ma
OnRL
68
334
0
22 Oct 2021
BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning
Jinyuan Jia
Yupei Liu
Neil Zhenqiang Gong
SILM
SSL
82
158
0
01 Aug 2021
Poisoning and Backdooring Contrastive Learning
Nicholas Carlini
Andreas Terzis
58
166
0
17 Jun 2021
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
486
1,917
0
14 Dec 2020
Backdoor Learning: A Survey
Yiming Li
Yong Jiang
Zhifeng Li
Shutao Xia
AAML
101
603
0
17 Jul 2020
Attack of the Tails: Yes, You Really Can Backdoor Federated Learning
Hongyi Wang
Kartik K. Sreenivasan
Shashank Rajput
Harit Vishwakarma
Saurabh Agarwal
Jy-yong Sohn
Kangwook Lee
Dimitris Papailiopoulos
FedML
76
605
0
09 Jul 2020
The relationship between trust in AI and trustworthy machine learning technologies
Ehsan Toreini
Mhairi Aitken
Kovila P. L. Coopamootoo
Karen Elliott
Carlos Vladimiro Gonzalez Zelaya
Aad van Moorsel
FaML
55
258
0
27 Nov 2019
SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization
Bogdan Gliwa
Iwona Mochol
M. Biesek
A. Wawer
117
631
0
27 Nov 2019
Adversarial Example Generation with Syntactically Controlled Paraphrase Networks
Mohit Iyyer
John Wieting
Kevin Gimpel
Luke Zettlemoyer
AAML
GAN
327
719
0
17 Apr 2018
1