Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.17735
Cited By
Automating Safety Enhancement for LLM-based Agents with Synthetic Risk Scenarios
23 May 2025
Xueyang Zhou
Weidong Wang
Lin Lu
Jiawen Shi
Guiyao Tie
Yongtian Xu
Lixing Chen
Pan Zhou
Neil Zhenqiang Gong
Lichao Sun
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Automating Safety Enhancement for LLM-based Agents with Synthetic Risk Scenarios"
27 / 27 papers shown
Title
Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents
Junkai Li
Yunghwei Lai
Weitao Li
Jingyi Ren
Meng Zhang
...
Siyu Wang
Ziwei Sun
Yanzhe Zhang
Weizhi Ma
Yang Liu
LLMAG
LM&MA
LM&Ro
MedIm
133
108
0
20 Jan 2025
Align-Pro: A Principled Approach to Prompt Optimization for LLM Alignment
Prashant Trivedi
Souradip Chakraborty
Avinash Reddy
Vaneet Aggarwal
Amrit Singh Bedi
George K. Atia
41
2
0
08 Jan 2025
Attacking Vision-Language Computer Agents via Pop-ups
Yanzhe Zhang
Tao Yu
Diyi Yang
AAML
VLM
60
25
0
04 Nov 2024
GPT-4o System Card
OpenAI OpenAI
:
Aaron Hurst
Adam Lerer
Adam P. Goucher
...
Yuchen He
Yuchen Zhang
Yujia Jin
Yunxing Dai
Yury Malkov
MLLM
126
750
0
25 Oct 2024
AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents
Edoardo Debenedetti
Jie Zhang
Mislav Balunović
Luca Beurer-Kellner
Marc Fischer
Florian Tramèr
LLMAG
AAML
73
33
1
19 Jun 2024
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
Team GLM
:
Aohan Zeng
Bin Xu
Bowen Wang
...
Zhaoyu Wang
Zhen Yang
Zhengxiao Du
Zhenyu Hou
Zihan Wang
ALM
102
566
0
18 Jun 2024
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents
Qiusi Zhan
Zhixiang Liang
Zifan Ying
Daniel Kang
LLMAG
78
90
0
05 Mar 2024
How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries
Somnath Banerjee
Sayan Layek
Rima Hazra
Animesh Mukherjee
49
15
0
23 Feb 2024
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents
Wenkai Yang
Xiaohan Bi
Yankai Lin
Sishuo Chen
Jie Zhou
Xu Sun
LLMAG
AAML
75
60
0
17 Feb 2024
Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science
Xiangru Tang
Qiao Jin
Kunlun Zhu
Tongxin Yuan
Yichi Zhang
...
Jian Tang
Zhuosheng Zhang
Arman Cohan
Zhiyong Lu
Mark B. Gerstein
LLMAG
ELM
58
47
0
06 Feb 2024
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
Tongxin Yuan
Zhiwei He
Lingzhong Dong
Yiming Wang
Ruijie Zhao
...
Binglin Zhou
Fangqi Li
Zhuosheng Zhang
Rui Wang
Gongshen Liu
ELM
48
71
0
18 Jan 2024
A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly
Yifan Yao
Jinhao Duan
Kaidi Xu
Yuanfang Cai
Eric Sun
Yue Zhang
PILM
ELM
83
504
0
04 Dec 2023
Evil Geniuses: Delving into the Safety of LLM-based Agents
Yu Tian
Xiao Yang
Jingyuan Zhang
Yinpeng Dong
Hang Su
LLMAG
AAML
59
61
0
20 Nov 2023
Testing Language Model Agents Safely in the Wild
Silen Naihin
David Atkinson
Marc Green
Merwane Hamadi
Craig Swift
Douglas Schonholtz
Adam Tauman Kalai
David Bau
LLMAG
39
20
0
17 Nov 2023
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving
Zhibin Gou
Zhihong Shao
Yeyun Gong
Yelong Shen
Yujiu Yang
Minlie Huang
Nan Duan
Weizhu Chen
LRM
AI4CE
LLMAG
66
151
0
29 Sep 2023
Identifying the Risks of LM Agents with an LM-Emulated Sandbox
Yangjun Ruan
Honghua Dong
Andrew Wang
Silviu Pitis
Yongchao Zhou
Jimmy Ba
Yann Dubois
Chris J. Maddison
Tatsunori Hashimoto
LLMAG
ELM
32
106
0
25 Sep 2023
Plug in the Safety Chip: Enforcing Constraints for LLM-driven Robot Agents
Ziyi Yang
S. S. Raman
Ankit Parag Shah
Stefanie Tellex
LLMAG
40
41
0
18 Sep 2023
Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM
Bochuan Cao
Yu Cao
Lu Lin
Jinghui Chen
AAML
38
145
0
18 Sep 2023
BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
Jiaming Ji
Mickel Liu
Juntao Dai
Xuehai Pan
Chi Zhang
Ce Bian
Chi Zhang
Ruiyang Sun
Yizhou Wang
Yaodong Yang
ALM
64
460
0
10 Jul 2023
Jailbroken: How Does LLM Safety Training Fail?
Alexander Wei
Nika Haghtalab
Jacob Steinhardt
158
928
0
05 Jul 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
233
4,186
0
09 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
276
3,712
0
29 May 2023
Reflexion: Language Agents with Verbal Reinforcement Learning
Noah Shinn
Federico Cassano
Beck Labash
A. Gopinath
Karthik Narasimhan
Shunyu Yao
LLMAG
KELM
37
1,190
0
20 Mar 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
625
13,788
0
15 Mar 2023
Toolformer: Language Models Can Teach Themselves to Use Tools
Timo Schick
Jane Dwivedi-Yu
Roberto Dessì
Roberta Raileanu
Maria Lomeli
Luke Zettlemoyer
Nicola Cancedda
Thomas Scialom
SyDa
RALM
92
1,670
0
09 Feb 2023
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
342
2,709
0
06 Oct 2022
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
212
2,457
0
12 Apr 2022
1