Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.03007
Cited By
BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents
5 June 2024
Yifei Wang
Dizhan Xue
Shengjie Zhang
Shengsheng Qian
AAML
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents"
14 / 14 papers shown
Title
A Survey of Attacks on Large Language Models
Wenrui Xu
Keshab K. Parhi
AAML
ELM
14
0
0
18 May 2025
UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models
Huawei Lin
Yingjie Lao
Tong Geng
Tan Yu
Weijie Zhao
AAML
SILM
79
2
0
18 Feb 2025
Mimicking the Familiar: Dynamic Command Generation for Information Theft Attacks in LLM Tool-Learning System
Ziyou Jiang
Mingyang Li
Guowei Yang
Junjie Wang
Yuekai Huang
Zhiyuan Chang
Qing Wang
AAML
54
1
0
17 Feb 2025
Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks
Ang Li
Yin Zhou
Vethavikashini Chithrra Raghuram
Tom Goldstein
Micah Goldblum
AAML
86
7
0
12 Feb 2025
AdvWeb: Controllable Black-box Attacks on VLM-powered Web Agents
Chejian Xu
Mintong Kang
Jiawei Zhang
Zeyi Liao
Lingbo Mo
Mengqi Yuan
Huan Sun
Bo Li
AAML
38
13
0
22 Oct 2024
SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization
Akrit Mudvari
Yuang Jiang
Leandros Tassiulas
30
2
0
14 Oct 2024
PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
Tingchen Fu
Mrinank Sharma
Philip Torr
Shay B. Cohen
David M. Krueger
Fazl Barez
AAML
50
7
0
11 Oct 2024
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents
Hanrong Zhang
Jingyuan Huang
Kai Mei
Yifei Yao
Zhenting Wang
Chenlu Zhan
Hongwei Wang
Yongfeng Zhang
AAML
LLMAG
ELM
51
22
0
03 Oct 2024
Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges
Qin Liu
Wenjie Mo
Terry Tong
Lyne Tchapmi
Fei Wang
Chaowei Xiao
Muhao Chen
AAML
39
4
0
30 Sep 2024
Security and Privacy Challenges of Large Language Models: A Survey
B. Das
M. H. Amini
Yanzhao Wu
PILM
ELM
19
107
0
30 Jan 2024
Erasing Self-Supervised Learning Backdoor by Cluster Activation Masking
Shengsheng Qian
Yifei Wang
Dizhan Xue
Shengjie Zhang
Huaiwen Zhang
Changsheng Xu
AAML
43
1
0
13 Dec 2023
Backdoor Attacks and Countermeasures in Natural Language Processing Models: A Comprehensive Security Review
Pengzhou Cheng
Zongru Wu
Wei Du
Haodong Zhao
Wei Lu
Gongshen Liu
SILM
AAML
34
17
0
12 Sep 2023
Poisoning Language Models During Instruction Tuning
Alexander Wan
Eric Wallace
Sheng Shen
Dan Klein
SILM
104
186
0
01 May 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
339
12,003
0
04 Mar 2022
1