YourBench: Easy Custom Evaluation Sets for Everyone

2 April 2025

Papers citing "YourBench: Easy Custom Evaluation Sets for Everyone"

4 / 4 papers shown

Title
Know Or Not: a library for evaluating out-of-knowledge base robustness Jessica Foo Pradyumna Shyama Prasad Shaun Khoo 67 0 0 19 May 2025
Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information Joshua Harris Fan Grayson Felix Feldman Timothy Laurence Toby Nonnenmacher ... Leo Loman Selina Patel Thomas Finnie Samuel Collins Michael Borowitz AI4MH LM&MA ELM 141 0 0 09 May 2025
Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation Satyapriya Krishna Kalpesh Krishna Anhad Mohananey Steven Schwarcz Adam Stambler Shyam Upadhyay Manaal Faruqui ReLM 3DV LRM RALM 99 30 0 28 Jan 2025
Training on the Test Task Confounds Evaluation and Emergence Ricardo Dominguez-Olmedo Florian E. Dorner Moritz Hardt ELM 154 9 1 10 Jul 2024