Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.14127
Cited By
v1
v2 (latest)
Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above
19 February 2025
Nishant Balepur
Rachel Rudinger
Jordan Lee Boyd-Graber
AI4Ed
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above"
5 / 5 papers shown
Title
WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models
Abdullah Mushtaq
Imran Taj
Rafay Naeem
Ibrahim Ghaznavi
Junaid Qadir
59
0
0
14 May 2025
BLAB: Brutally Long Audio Bench
Orevaoghene Ahia
Martijn Bartelds
Kabir Ahuja
Hila Gonen
Valentin Hofmann
...
Noah Bennett
Shinji Watanabe
Noah A. Smith
Yulia Tsvetkov
Sachin Kumar
AuLLM
LM&MA
VLM
112
0
0
05 May 2025
COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning
Xindi Wu
Hee Seung Hwang
Polina Kirichenko
Olga Russakovsky
VLM
CoGe
125
1
0
30 Apr 2025
Ethical AI on the Waitlist: Group Fairness Evaluation of LLM-Aided Organ Allocation
Hannah Murray
Brian Hyeongseok Kim
Isabelle Lee
Jason Byun
Dani Yogatama
Evi Micha
82
1
0
29 Mar 2025
Language Models Fail to Introspect About Their Knowledge of Language
Siyuan Song
Jennifer Hu
Kyle Mahowald
LRM
KELM
HILM
ELM
115
4
0
10 Mar 2025
1