ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.10019
  4. Cited By
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents

R-Judge: Benchmarking Safety Risk Awareness for LLM Agents

18 January 2024
Tongxin Yuan
Zhiwei He
Lingzhong Dong
Yiming Wang
Ruijie Zhao
Tian Xia
Lizhen Xu
Binglin Zhou
Fangqi Li
Zhuosheng Zhang
Rui Wang
Gongshen Liu
    ELM
ArXivPDFHTML

Papers citing "R-Judge: Benchmarking Safety Risk Awareness for LLM Agents"

46 / 46 papers shown
Title
GuideBench: Benchmarking Domain-Oriented Guideline Following for LLM Agents
GuideBench: Benchmarking Domain-Oriented Guideline Following for LLM Agents
Lingxiao Diao
Xinyue Xu
Wanxuan Sun
Cheng Yang
Zhuosheng Zhang
LLMAG
ALM
ELM
7
0
0
16 May 2025
A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?
A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?
Ada Chen
Yongjiang Wu
Jun Zhang
Shu Yang
Jen-tse Huang
Kun Wang
Wenxuan Wang
Shuai Wang
ELM
12
0
0
16 May 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
Zhaoxin Fan
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Tianwei Zhang
ALM
ELM
86
1
0
26 Apr 2025
RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models
RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models
Bang An
Shiyue Zhang
Mark Dredze
61
0
0
25 Apr 2025
Efficient MAP Estimation of LLM Judgment Performance with Prior Transfer
Efficient MAP Estimation of LLM Judgment Performance with Prior Transfer
Huaizhi Qu
Inyoung Choi
Zhen Tan
Song Wang
Sukwon Yun
Qi Long
Faizan Siddiqui
Kwonjoon Lee
Tianlong Chen
43
0
0
17 Apr 2025
Following the Whispers of Values: Unraveling Neural Mechanisms Behind Value-Oriented Behaviors in LLMs
Following the Whispers of Values: Unraveling Neural Mechanisms Behind Value-Oriented Behaviors in LLMs
Ling Hu
Yuemei Xu
Xiaoyang Gu
Letao Han
28
0
0
07 Apr 2025
A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models
A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models
Liangbo Ning
Ziran Liang
Zhuohang Jiang
Haohao Qu
Yujuan Ding
...
Xiao Wei
Shanru Lin
Hui Liu
Philip S. Yu
Qing Li
LLMAG
LM&Ro
91
6
0
30 Mar 2025
FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models
FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models
Dahyun Jung
Seungyoon Lee
Hyeonseok Moon
Chanjun Park
Heuiseok Lim
AAML
ALM
ELM
55
0
0
25 Mar 2025
Towards Understanding the Safety Boundaries of DeepSeek Models: Evaluation and Findings
Towards Understanding the Safety Boundaries of DeepSeek Models: Evaluation and Findings
Zonghao Ying
Guangyi Zheng
Yongxin Huang
Deyue Zhang
Wenxin Zhang
Quanchen Zou
Aishan Liu
X. Liu
Dacheng Tao
ELM
74
6
0
19 Mar 2025
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
120
67
0
25 Nov 2024
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments
Kung-Hsiang Huang
Akshara Prabhakar
Sidharth Dhawan
Yixin Mao
Huan Wang
Silvio Savarese
Caiming Xiong
Philippe Laban
C. Wu
44
7
0
04 Nov 2024
SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and
  Prompt Types
SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types
Yutao Mou
Shikun Zhang
Wei Ye
ELM
40
8
0
29 Oct 2024
Quantifying Risk Propensities of Large Language Models: Ethical Focus and Bias Detection through Role-Play
Quantifying Risk Propensities of Large Language Models: Ethical Focus and Bias Detection through Role-Play
Yifan Zeng
Liang Kairong
Fangzhou Dong
Peijia Zheng
53
0
0
26 Oct 2024
MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile
  Device Control
MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control
Juyong Lee
Dongyoon Hahm
June Suk Choi
W. Bradley Knox
Kimin Lee
LLMAG
ELM
AAML
LM&Ro
43
2
0
23 Oct 2024
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
Yuzhe Yang
Yifei Zhang
Yan Hu
Y. Guo
Ruoli Gan
...
Haining Wang
Qianqian Xie
Jimin Huang
Honghai Yu
Benyou Wang
ELM
AIFin
42
2
0
17 Oct 2024
SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI
SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI
Yu Yang
Yuzhou Nie
Zhun Wang
Yuheng Tang
Wenbo Guo
Bo Li
D. Song
ELM
38
6
0
14 Oct 2024
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Shanshan Han
84
1
0
09 Oct 2024
ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents
ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents
Ido Levy
Ben wiesel
Sami Marreed
Alon Oved
Avi Yaeli
Segev Shlomov
LLMAG
29
14
0
09 Oct 2024
RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph
RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph
Siru Ouyang
W. Yu
Kaixin Ma
Zilin Xiao
Z. Zhang
Mengzhao Jia
J. Han
H. Zhang
Dong Yu
54
12
0
03 Oct 2024
GenTel-Safe: A Unified Benchmark and Shielding Framework for Defending
  Against Prompt Injection Attacks
GenTel-Safe: A Unified Benchmark and Shielding Framework for Defending Against Prompt Injection Attacks
Rongchang Li
Minjie Chen
Chang Hu
Han Chen
Wenpeng Xing
Meng Han
SILM
ELM
39
1
0
29 Sep 2024
Holistic Automated Red Teaming for Large Language Models through
  Top-Down Test Case Generation and Multi-turn Interaction
Holistic Automated Red Teaming for Large Language Models through Top-Down Test Case Generation and Multi-turn Interaction
Jinchuan Zhang
Yan Zhou
Yaxin Liu
Ziming Li
Songlin Hu
AAML
26
3
0
25 Sep 2024
Alignment with Preference Optimization Is All You Need for LLM Safety
Alignment with Preference Optimization Is All You Need for LLM Safety
Réda Alami
Ali Khalifa Almansoori
Ahmed Alzubaidi
M. Seddik
Mugariya Farooq
Hakim Hacid
32
1
0
12 Sep 2024
Athena: Safe Autonomous Agents with Verbal Contrastive Learning
Athena: Safe Autonomous Agents with Verbal Contrastive Learning
Tanmana Sadhu
Ali Pesaranghader
Yanan Chen
Dong Hoon Yi
ELM
LLMAG
AAML
26
0
0
20 Aug 2024
MultiSurf-GPT: Facilitating Context-Aware Reasoning with Large-Scale
  Language Models for Multimodal Surface Sensing
MultiSurf-GPT: Facilitating Context-Aware Reasoning with Large-Scale Language Models for Multimodal Surface Sensing
Yongquan Hu
Black Sun
Pengcheng An
Zhuying Li
Wen Hu
Aaron Quigley
LRM
38
1
0
14 Aug 2024
Caution for the Environment: Multimodal Agents are Susceptible to
  Environmental Distractions
Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions
Xinbei Ma
Yiting Wang
Yao Yao
Tongxin Yuan
Aston Zhang
Zhuosheng Zhang
Hai Zhao
AAML
LLMAG
27
17
0
05 Aug 2024
A Survey of Large Language Models for Financial Applications: Progress,
  Prospects and Challenges
A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges
Yuqi Nie
Yaxuan Kong
Xiaowen Dong
John M. Mulvey
H. Vincent Poor
Qingsong Wen
Stefan Zohren
AIFin
45
42
0
15 Jun 2024
GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning
GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning
Zhen Xiang
Linzhi Zheng
Yanjie Li
Junyuan Hong
Qinbin Li
...
Zidi Xiong
Chulin Xie
Carl Yang
Dawn Song
Bo Li
LLMAG
45
23
0
13 Jun 2024
Beyond Words: On Large Language Models Actionability in Mission-Critical
  Risk Analysis
Beyond Words: On Large Language Models Actionability in Mission-Critical Risk Analysis
Matteo Esposito
Francesco Palagiano
Valentina Lenarduzzi
Davide Taibi
93
6
0
11 Jun 2024
CaLM: Contrasting Large and Small Language Models to Verify Grounded
  Generation
CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation
I-Hung Hsu
Zifeng Wang
Long T. Le
Lesly Miculicich
Nanyun Peng
Chen-Yu Lee
Tomas Pfister
HILM
29
4
0
08 Jun 2024
A Survey of Language-Based Communication in Robotics
A Survey of Language-Based Communication in Robotics
William Hunt
Sarvapali D. Ramchurn
Mohammad D. Soorati
LM&Ro
62
12
0
06 Jun 2024
AI Agents Under Threat: A Survey of Key Security Challenges and Future
  Pathways
AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways
Zehang Deng
Yongjian Guo
Changzhou Han
Wanlun Ma
Junwu Xiong
Sheng Wen
Yang Xiang
44
23
0
04 Jun 2024
Safeguarding Large Language Models: A Survey
Safeguarding Large Language Models: A Survey
Yi Dong
Ronghui Mu
Yanghao Zhang
Siqi Sun
Tianle Zhang
...
Yi Qi
Jinwei Hu
Jie Meng
Saddek Bensalem
Xiaowei Huang
OffRL
KELM
AILaw
35
17
0
03 Jun 2024
OR-Bench: An Over-Refusal Benchmark for Large Language Models
OR-Bench: An Over-Refusal Benchmark for Large Language Models
Justin Cui
Wei-Lin Chiang
Ion Stoica
Cho-Jui Hsieh
ALM
38
33
0
31 May 2024
ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based
  Evaluation
ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation
Jingnan Zheng
Han Wang
An Zhang
Tai D. Nguyen
Jun Sun
Tat-Seng Chua
LLMAG
40
14
0
23 May 2024
When LLMs Meet Cybersecurity: A Systematic Literature Review
When LLMs Meet Cybersecurity: A Systematic Literature Review
Jie Zhang
Haoyu Bu
Hui Wen
Yu Chen
Lun Li
Hongsong Zhu
42
36
0
06 May 2024
AI Governance and Accountability: An Analysis of Anthropic's Claude
AI Governance and Accountability: An Analysis of Anthropic's Claude
Aman Priyanshu
Yash Maurya
Zuofei Hong
31
3
0
02 May 2024
Testing and Understanding Erroneous Planning in LLM Agents through
  Synthesized User Inputs
Testing and Understanding Erroneous Planning in LLM Agents through Synthesized User Inputs
Zhenlan Ji
Daoyuan Wu
Pingchuan Ma
Zongjie Li
Shuai Wang
LLMAG
48
3
0
27 Apr 2024
Measuring Bargaining Abilities of LLMs: A Benchmark and A
  Buyer-Enhancement Method
Measuring Bargaining Abilities of LLMs: A Benchmark and A Buyer-Enhancement Method
Tian Xia
Zhiwei He
Tong Ren
Yibo Miao
Zhuosheng Zhang
Yang Yang
Rui Wang
43
13
0
24 Feb 2024
Can Watermarks Survive Translation? On the Cross-lingual Consistency of
  Text Watermark for Large Language Models
Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models
Zhiwei He
Binglin Zhou
Hong-ping Hao
Aiwei Liu
Xing Wang
Zhaopeng Tu
Zhuosheng Zhang
Rui Wang
WaLM
76
18
0
21 Feb 2024
CoCo-Agent: A Comprehensive Cognitive MLLM Agent for Smartphone GUI
  Automation
CoCo-Agent: A Comprehensive Cognitive MLLM Agent for Smartphone GUI Automation
Xinbei Ma
Zhuosheng Zhang
Hai Zhao
LLMAG
33
21
0
19 Feb 2024
ToolSword: Unveiling Safety Issues of Large Language Models in Tool
  Learning Across Three Stages
ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages
Junjie Ye
Sixian Li
Guanyu Li
Caishuang Huang
Songyang Gao
Yilong Wu
Qi Zhang
Tao Gui
Xuanjing Huang
LLMAG
30
16
0
16 Feb 2024
A Trembling House of Cards? Mapping Adversarial Attacks against Language
  Agents
A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents
Lingbo Mo
Zeyi Liao
Boyuan Zheng
Yu-Chuan Su
Chaowei Xiao
Huan Sun
AAML
LLMAG
41
15
0
15 Feb 2024
Towards Unified Alignment Between Agents, Humans, and Environment
Towards Unified Alignment Between Agents, Humans, and Environment
Zonghan Yang
An Liu
Zijun Liu
Kai Liu
Fangzhou Xiong
...
Zhenhe Zhang
Fuwen Luo
Zhicheng Guo
Peng Li
Yang Liu
29
4
0
12 Feb 2024
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
240
2,494
0
06 Oct 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
322
4,077
0
24 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
1