Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.11855
Cited By
Evil Geniuses: Delving into the Safety of LLM-based Agents
20 November 2023
Yu Tian
Xiao Yang
Jingyuan Zhang
Yinpeng Dong
Hang Su
LLMAG
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Evil Geniuses: Delving into the Safety of LLM-based Agents"
12 / 12 papers shown
Title
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems
Shaokun Zhang
Ming Yin
Jieyu Zhang
Jing Liu
Zhiguang Han
...
Beibin Li
Chi Wang
H. Wang
Yuxiao Chen
Qingyun Wu
49
1
0
30 Apr 2025
RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models
Bang An
Shiyue Zhang
Mark Dredze
61
0
0
25 Apr 2025
Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks
Ang Li
Yin Zhou
Vethavikashini Chithrra Raghuram
Tom Goldstein
Micah Goldblum
AAML
86
7
0
12 Feb 2025
ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents
Ido Levy
Ben Wiesel
Sami Marreed
Alon Oved
Avi Yaeli
Segev Shlomov
LLMAG
37
15
0
09 Oct 2024
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min-Bin Lin
47
8
0
09 Oct 2024
Root Defence Strategies: Ensuring Safety of LLM at the Decoding Level
Xinyi Zeng
Yuying Shang
Yutao Zhu
Jingyuan Zhang
Yu Tian
AAML
154
2
0
09 Oct 2024
Holistic Automated Red Teaming for Large Language Models through Top-Down Test Case Generation and Multi-turn Interaction
Jinchuan Zhang
Yan Zhou
Yaxin Liu
Ziming Li
Songlin Hu
AAML
34
3
0
25 Sep 2024
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents
Wenkai Yang
Xiaohan Bi
Yankai Lin
Sishuo Chen
Jie Zhou
Xu Sun
LLMAG
AAML
44
56
0
17 Feb 2024
AttackEval: How to Evaluate the Effectiveness of Jailbreak Attacking on Large Language Models
Dong Shu
Mingyu Jin
Suiyuan Zhu
Beichen Wang
Zihao Zhou
Chong Zhang
Yongfeng Zhang
ELM
47
12
0
17 Jan 2024
Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Models
Lucio La Cava
Andrea Tagarelli
LLMAG
AI4CE
63
13
0
13 Jan 2024
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
119
303
0
19 Sep 2023
Generative Agents: Interactive Simulacra of Human Behavior
J. Park
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
LM&Ro
AI4CE
232
1,754
0
07 Apr 2023
1