Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.11672
Cited By
Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answers
15 October 2024
Lorenzo Pacchiardi
Marko Tesic
Lucy G. Cheke
José Hernández-Orallo
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answers"
3 / 3 papers shown
Title
Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark
Jasper Götting
Pedro Medeiros
Jon G Sanders
Nathaniel Li
Long Phan
Karam Elabd
Lennart Justen
Dan Hendrycks
Seth Donoughe
ELM
55
2
0
21 Apr 2025
Re-evaluating Theory of Mind evaluation in large language models
Jennifer Hu
Felix Sosa
T. Ullman
45
0
0
28 Feb 2025
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation
Maria Eriksson
Erasmo Purificato
Arman Noroozian
Joao Vinagre
Guillaume Chaslot
Emilia Gomez
David Fernandez Llorca
ELM
139
1
0
10 Feb 2025
1