Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.15930
Cited By
WorldSense: A Synthetic Benchmark for Grounded Reasoning in Large Language Models
27 November 2023
Youssef Benchekroun
Megi Dervishi
Mark Ibrahim
Jean-Baptiste Gaya
Xavier Martinet
Grégoire Mialon
Thomas Scialom
Emmanuel Dupoux
Dieuwke Hupkes
Pascal Vincent
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"WorldSense: A Synthetic Benchmark for Grounded Reasoning in Large Language Models"
4 / 4 papers shown
Title
STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?
Heng Chang
Yuyao Zhang
Tao Lin
Xiangrui Liu
Wenxiao Cai
Zhengyang Liang
Bo Zhao
LRM
58
4
0
31 Mar 2025
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Aman Singh Thakur
Kartik Choudhary
Venkat Srinik Ramayapally
Sankaran Vaidyanathan
Dieuwke Hupkes
ELM
ALM
61
57
0
18 Jun 2024
Don't Make Your LLM an Evaluation Benchmark Cheater
Kun Zhou
Yutao Zhu
Zhipeng Chen
Wentong Chen
Wayne Xin Zhao
Xu Chen
Yankai Lin
Ji-Rong Wen
Jiawei Han
ELM
110
138
0
03 Nov 2023
Unbiased Math Word Problems Benchmark for Mitigating Solving Bias
Zhicheng YANG
Jinghui Qin
Jiaqi Chen
Xiaodan Liang
107
12
0
17 May 2022
1