ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.14127
  4. Cited By
Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above
v1v2 (latest)

Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above

19 February 2025
Nishant Balepur
Rachel Rudinger
Jordan Lee Boyd-Graber
    AI4EdELM
ArXiv (abs)PDFHTML

Papers citing "Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above"

5 / 5 papers shown
Title
WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models
WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models
Abdullah Mushtaq
Imran Taj
Rafay Naeem
Ibrahim Ghaznavi
Junaid Qadir
59
0
0
14 May 2025
BLAB: Brutally Long Audio Bench
BLAB: Brutally Long Audio Bench
Orevaoghene Ahia
Martijn Bartelds
Kabir Ahuja
Hila Gonen
Valentin Hofmann
...
Noah Bennett
Shinji Watanabe
Noah A. Smith
Yulia Tsvetkov
Sachin Kumar
AuLLMLM&MAVLM
112
0
0
05 May 2025
COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning
COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning
Xindi Wu
Hee Seung Hwang
Polina Kirichenko
Olga Russakovsky
VLMCoGe
125
1
0
30 Apr 2025
Ethical AI on the Waitlist: Group Fairness Evaluation of LLM-Aided Organ Allocation
Ethical AI on the Waitlist: Group Fairness Evaluation of LLM-Aided Organ Allocation
Hannah Murray
Brian Hyeongseok Kim
Isabelle Lee
Jason Byun
Dani Yogatama
Evi Micha
82
1
0
29 Mar 2025
Language Models Fail to Introspect About Their Knowledge of Language
Siyuan Song
Jennifer Hu
Kyle Mahowald
LRMKELMHILMELM
115
4
0
10 Mar 2025
1