Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2412.03597
Cited By
The Vulnerability of Language Model Benchmarks: Do They Accurately Reflect True LLM Performance?
2 December 2024
Sourav Banerjee
Ayushi Agarwal
Eishkaran Singh
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Vulnerability of Language Model Benchmarks: Do They Accurately Reflect True LLM Performance?"
1 / 1 papers shown
Title
A Framework for Benchmarking and Aligning Task-Planning Safety in LLM-Based Embodied Agents
Yuting Huang
Leilei Ding
ZhiPeng Tang
Tianfu Wang
Xinrui Lin
Wenbo Zhang
Mingxiao Ma
Yanyong Zhang
LLMAG
40
0
0
20 Apr 2025
1