ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.01833
  4. Cited By
YourBench: Easy Custom Evaluation Sets for Everyone

YourBench: Easy Custom Evaluation Sets for Everyone

2 April 2025
Shivalika Singh
Clémentine Fourrier
Alina Lozovskia
Thomas Wolf
Gokhan Tur
Dilek Hakkani-Tur
ArXiv (abs)PDFHTML

Papers citing "YourBench: Easy Custom Evaluation Sets for Everyone"

4 / 4 papers shown
Title
Know Or Not: a library for evaluating out-of-knowledge base robustness
Know Or Not: a library for evaluating out-of-knowledge base robustness
Jessica Foo
Pradyumna Shyama Prasad
Shaun Khoo
67
0
0
19 May 2025
Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information
Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information
Joshua Harris
Fan Grayson
Felix Feldman
Timothy Laurence
Toby Nonnenmacher
...
Leo Loman
Selina Patel
Thomas Finnie
Samuel Collins
Michael Borowitz
AI4MHLM&MAELM
141
0
0
09 May 2025
Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation
Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation
Satyapriya Krishna
Kalpesh Krishna
Anhad Mohananey
Steven Schwarcz
Adam Stambler
Shyam Upadhyay
Manaal Faruqui
ReLM3DVLRMRALM
99
30
0
28 Jan 2025
Training on the Test Task Confounds Evaluation and Emergence
Training on the Test Task Confounds Evaluation and Emergence
Ricardo Dominguez-Olmedo
Florian E. Dorner
Moritz Hardt
ELM
154
9
1
10 Jul 2024
1