Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.11208
Cited By
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents
17 February 2024
Wenkai Yang
Xiaohan Bi
Yankai Lin
Sishuo Chen
Jie Zhou
Xu Sun
LLMAG
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents"
41 / 41 papers shown
Title
A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?
Ada Chen
Yongjiang Wu
Jingyang Zhang
Shu Yang
Jen-tse Huang
Kun Wang
Wenxuan Wang
Shuai Wang
ELM
12
0
0
16 May 2025
A Survey of Scaling in Large Language Model Reasoning
Zihan Chen
Song Wang
Zhen Tan
Xingbo Fu
Zhenyu Lei
Peng Wang
Huan Liu
Cong Shen
Jundong Li
LRM
88
0
0
02 Apr 2025
A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models
Liangbo Ning
Ziran Liang
Zhuohang Jiang
Haohao Qu
Yujuan Ding
...
Xiao Wei
Shanru Lin
Hui Liu
Philip S. Yu
Qing Li
LLMAG
LM&Ro
91
6
0
30 Mar 2025
AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents
Haoyu Wang
Christopher M. Poskitt
Jun Sun
37
0
0
24 Mar 2025
CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities
Yuxuan Zhu
Antony Kellermann
Dylan Bowman
Philip Li
Akul Gupta
...
Avi Dhir
Sudhit Rao
Kaicheng Yu
Twm Stone
Daniel Kang
LLMAG
ELM
74
3
0
21 Mar 2025
In-Context Defense in Computer Agents: An Empirical Study
Pei Yang
Hai Ci
Mike Zheng Shou
AAML
LLMAG
88
0
0
12 Mar 2025
Multi-Agent Security Tax: Trading Off Security and Collaboration Capabilities in Multi-Agent Systems
Pierre Peigne-Lefebvre
Mikolaj Kniejski
Filip Sondej
Matthieu David
J. Hoelscher-Obermaier
Christian Schroeder de Witt
Esben Kran
56
4
0
26 Feb 2025
Char-mander Use mBackdoor! A Study of Cross-lingual Backdoor Attacks in Multilingual LLMs
Himanshu Beniwal
Sailesh Panda
Mayank Singh
42
0
0
24 Feb 2025
Unified Prompt Attack Against Text-to-Image Generation Models
Duo Peng
Qiuhong Ke
Mark He Huang
Ping Hu
Xiaozhong Liu
48
0
0
23 Feb 2025
Class-Conditional Neural Polarizer: A Lightweight and Effective Backdoor Defense by Purifying Poisoned Features
Mingli Zhu
Shaokui Wei
Hongyuan Zha
Baoyuan Wu
AAML
44
0
0
23 Feb 2025
UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models
Huawei Lin
Yingjie Lao
Tong Geng
Tan Yu
Weijie Zhao
AAML
SILM
79
2
0
18 Feb 2025
Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks
Ang Li
Yin Zhou
Vethavikashini Chithrra Raghuram
Tom Goldstein
Micah Goldblum
AAML
83
7
0
12 Feb 2025
PEFTGuard: Detecting Backdoor Attacks Against Parameter-Efficient Fine-Tuning
Zhen Sun
Tianshuo Cong
Yule Liu
Chenhao Lin
Xinlei He
Rongmao Chen
Xingshuo Han
Xinyi Huang
AAML
85
3
0
26 Nov 2024
Attacking Vision-Language Computer Agents via Pop-ups
Yanzhe Zhang
Tao Yu
Diyi Yang
AAML
VLM
35
18
0
04 Nov 2024
AdvWeb: Controllable Black-box Attacks on VLM-powered Web Agents
Chejian Xu
Mintong Kang
Jiawei Zhang
Zeyi Liao
Lingbo Mo
Mengqi Yuan
Huan Sun
Bo Li
AAML
32
11
0
22 Oct 2024
PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
Tingchen Fu
Mrinank Sharma
Philip H. S. Torr
Shay B. Cohen
David M. Krueger
Fazl Barez
AAML
44
7
0
11 Oct 2024
Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents
Priyanshu Kumar
Elaine Lau
Saranya Vijayakumar
Tu Trinh
Scale Red Team
...
Sean Hendryx
Shuyan Zhou
Matt Fredrikson
Summer Yue
Zifan Wang
LLMAG
34
17
0
11 Oct 2024
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents
Hanrong Zhang
Jingyuan Huang
Kai Mei
Yifei Yao
Zhenting Wang
Chenlu Zhan
Hongwei Wang
Yongfeng Zhang
AAML
LLMAG
ELM
51
20
0
03 Oct 2024
Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges
Qin Liu
Wenjie Mo
Terry Tong
Lyne Tchapmi
Fei Wang
Chaowei Xiao
Muhao Chen
AAML
33
4
0
30 Sep 2024
A Disguised Wolf Is More Harmful Than a Toothless Tiger: Adaptive Malicious Code Injection Backdoor Attack Leveraging User Behavior as Triggers
Shangxi Wu
Jitao Sang
SILM
AAML
31
1
0
19 Aug 2024
Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions
Xinbei Ma
Yiting Wang
Yao Yao
Tongxin Yuan
Aston Zhang
Zhuosheng Zhang
Hai Zhao
AAML
LLMAG
29
17
0
05 Aug 2024
Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification
Boyang Zhang
Yicong Tan
Yun Shen
Ahmed Salem
Michael Backes
Savvas Zannettou
Yang Zhang
LLMAG
AAML
44
14
0
30 Jul 2024
The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies
Feng He
Tianqing Zhu
Dayong Ye
Bo Liu
Wanlei Zhou
Philip S. Yu
PILM
LLMAG
ELM
68
23
0
28 Jul 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
54
10
0
20 Jul 2024
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases
Zhaorun Chen
Zhen Xiang
Chaowei Xiao
Dawn Song
Bo Li
LLMAG
AAML
37
47
0
17 Jul 2024
Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs
Sara Price
Arjun Panickssery
Sam Bowman
Asa Cooper Stickland
LLMSV
29
3
0
04 Jul 2024
Adversarial Attacks on Large Language Models in Medicine
Yifan Yang
Qiao Jin
Furong Huang
Zhiyong Lu
AAML
36
4
0
18 Jun 2024
Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey
Shang Wang
Tianqing Zhu
Bo Liu
Ming Ding
Xu Guo
Dayong Ye
Wanlei Zhou
Philip S. Yu
PILM
67
17
0
12 Jun 2024
AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways
Zehang Deng
Yongjian Guo
Changzhou Han
Wanlun Ma
Junwu Xiong
Sheng Wen
Yang Xiang
44
23
0
04 Jun 2024
TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models
Pengzhou Cheng
Yidong Ding
Tianjie Ju
Zongru Wu
Wei Du
Ping Yi
Zhuosheng Zhang
Gongshen Liu
SILM
AAML
34
19
0
22 May 2024
BadActs: A Universal Backdoor Defense in the Activation Space
Biao Yi
Sishuo Chen
Yiming Li
Tong Li
Baolei Zhang
Zheli Liu
AAML
48
5
0
18 May 2024
JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models
Yingchaojie Feng
Zhizhang Chen
Zhining Kang
Sijia Wang
Minfeng Zhu
Wei Zhang
Wei Chen
40
3
0
12 Apr 2024
Exploring Backdoor Vulnerabilities of Chat Models
Yunzhuo Hao
Wenkai Yang
Yankai Lin
SILM
KELM
23
9
0
03 Apr 2024
Optimization-based Prompt Injection Attack to LLM-as-a-Judge
Jiawen Shi
Zenghui Yuan
Yinuo Liu
Yue Huang
Pan Zhou
Lichao Sun
Neil Zhenqiang Gong
AAML
45
39
0
26 Mar 2024
ImgTrojan: Jailbreaking Vision-Language Models with ONE Image
Xijia Tao
Shuai Zhong
Lei Li
Qi Liu
Lingpeng Kong
39
25
0
05 Mar 2024
CoCo-Agent: A Comprehensive Cognitive MLLM Agent for Smartphone GUI Automation
Xinbei Ma
Zhuosheng Zhang
Hai Zhao
LLMAG
33
21
0
19 Feb 2024
Backdoor Attacks and Countermeasures in Natural Language Processing Models: A Comprehensive Security Review
Pengzhou Cheng
Zongru Wu
Wei Du
Haodong Zhao
Wei Lu
Gongshen Liu
SILM
AAML
29
17
0
12 Sep 2023
Poisoning Language Models During Instruction Tuning
Alexander Wan
Eric Wallace
Sheng Shen
Dan Klein
SILM
92
124
0
01 May 2023
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
240
2,494
0
06 Oct 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
322
4,077
0
24 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
1