Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.00823
Cited By
WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting
1 May 2024
Olly Styles
Sam Miller
Patricio Cerda-Mardini
T. Guha
Victor Sanchez
Bertie Vidgen
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting"
3 / 3 papers shown
Title
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
Zhaoxin Fan
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Tianwei Zhang
ALM
ELM
86
2
0
26 Apr 2025
Towards Evaluating Large Language Models for Graph Query Generation
Siraj Munir
Alessandro Aldini
ELM
38
0
0
13 Nov 2024
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments
Kung-Hsiang Huang
Akshara Prabhakar
Sidharth Dhawan
Yixin Mao
Huan Wang
Silvio Savarese
Caiming Xiong
Philippe Laban
C. Wu
44
7
0
04 Nov 2024
1