ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2412.03597
  4. Cited By
The Vulnerability of Language Model Benchmarks: Do They Accurately
  Reflect True LLM Performance?

The Vulnerability of Language Model Benchmarks: Do They Accurately Reflect True LLM Performance?

2 December 2024
Sourav Banerjee
Ayushi Agarwal
Eishkaran Singh
    ELM
ArXivPDFHTML

Papers citing "The Vulnerability of Language Model Benchmarks: Do They Accurately Reflect True LLM Performance?"

1 / 1 papers shown
Title
A Framework for Benchmarking and Aligning Task-Planning Safety in LLM-Based Embodied Agents
A Framework for Benchmarking and Aligning Task-Planning Safety in LLM-Based Embodied Agents
Yuting Huang
Leilei Ding
ZhiPeng Tang
Tianfu Wang
Xinrui Lin
Wenbo Zhang
Mingxiao Ma
Yanyong Zhang
LLMAG
40
0
0
20 Apr 2025
1