Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.06703
Cited By
ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents
9 October 2024
Ido Levy
Ben Wiesel
Sami Marreed
Alon Oved
Avi Yaeli
Segev Shlomov
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents"
11 / 11 papers shown
Title
A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?
Ada Chen
Yongjiang Wu
Jingyang Zhang
Shu Yang
Jen-tse Huang
Kun Wang
Wenxuan Wang
Shuai Wang
ELM
12
0
0
16 May 2025
UFO2: The Desktop AgentOS
Chaoyun Zhang
He Huang
Chiming Ni
J. Mu
Si Qin
...
Minghua Ma
Jian-Guang Lou
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
LLMAG
34
0
0
20 Apr 2025
DoomArena: A framework for Testing AI Agents Against Evolving Security Threats
Léo Boisvert
Mihir Bansal
Chandra Kiran Reddy Evuru
Gabriel Huang
Abhay Puri
...
Quentin Cappart
Jason Stanley
Alexandre Lacoste
Alexandre Drouin
Krishnamurthy Dvijotham
32
0
0
18 Apr 2025
REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites
Divyansh Garg
Shaun VanWeelden
Diego Caples
Andis Draguns
Nikil Ravi
...
Youngchul Joo
Jindong Gu
Charles London
Christian Schroeder de Witt
S. Motwani
41
1
0
15 Apr 2025
A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models
Liangbo Ning
Ziran Liang
Zhuohang Jiang
Haohao Qu
Yujuan Ding
...
Xiao Wei
Shanru Lin
Hui Liu
Philip S. Yu
Qing Li
LLMAG
LM&Ro
91
6
0
30 Mar 2025
Towards Trustworthy GUI Agents: A Survey
Yucheng Shi
Wenhao Yu
Wenlin Yao
Wenhu Chen
Ninghao Liu
44
4
0
30 Mar 2025
ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning
Z. Chen
Mintong Kang
Bo-wen Li
AAML
42
3
0
26 Mar 2025
Survey on Evaluation of LLM-based Agents
Asaf Yehudai
Lilach Eden
Alan Li
Guy Uziel
Yilun Zhao
Roy Bar-Haim
Arman Cohan
Michal Shmueli-Scheuer
LLMAG
ELM
Presented at
ResearchTrend Connect | LLMAG
on
07 May 2025
95
7
0
20 Mar 2025
SafeArena: Evaluating the Safety of Autonomous Web Agents
Ada Defne Tur
Nicholas Meade
Xing Han Lù
Alejandra Zambrano
Arkil Patel
Esin Durmus
Spandana Gella
Karolina Stañczak
Siva Reddy
LLMAG
ELM
87
2
0
06 Mar 2025
Towards Enterprise-Ready Computer Using Generalist Agent
Sami Marreed
Alon Oved
Avi Yaeli
Segev Shlomov
Ido Levy
Aviad Sela
Asaf Adi
Nir Mashkif
LLMAG
66
1
0
24 Feb 2025
The BrowserGym Ecosystem for Web Agent Research
Thibault Le Sellier De Chezelles
Maxime Gasse
Alexandre Lacoste
Alexandre Drouin
Massimo Caccia
...
Siva Reddy
Quentin Cappart
Graham Neubig
Ruslan Salakhutdinov
Nicolas Chapados
LLMAG
106
9
0
06 Dec 2024
1