ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.01533
  4. Cited By
Enhancing LLM Evaluations: The Garbling Trick
v1v2v3 (latest)

Enhancing LLM Evaluations: The Garbling Trick

3 November 2024
William F. Bradley
    LRMELM
ArXiv (abs)PDFHTML

Papers citing "Enhancing LLM Evaluations: The Garbling Trick"

8 / 8 papers shown
Title
LLMs and the Madness of Crowds
LLMs and the Madness of Crowds
William F. Bradley
41
2
0
03 Nov 2024
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in
  Large Language Models
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
Iman Mirzadeh
Keivan Alizadeh
Hooman Shahrokhi
Oncel Tuzel
Samy Bengio
Mehrdad Farajtabar
AIMatLRM
146
186
0
07 Oct 2024
MMLU-Pro: A More Robust and Challenging Multi-Task Language
  Understanding Benchmark
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Yubo Wang
Xueguang Ma
Ge Zhang
Yuansheng Ni
Abhranil Chandra
...
Kai Wang
Alex Zhuang
Rongqi Fan
Xiang Yue
Wenhu Chen
LRMELM
169
465
0
03 Jun 2024
Can Perplexity Reflect Large Language Model's Ability in Long Text
  Understanding?
Can Perplexity Reflect Large Language Model's Ability in Long Text Understanding?
Yutong Hu
Quzhe Huang
Mingxu Tao
Chen Zhang
Yansong Feng
94
31
0
09 May 2024
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLMOffRLLRM
425
4,609
0
27 Oct 2021
An Ensemble of Simple Convolutional Neural Network Models for MNIST
  Digit Recognition
An Ensemble of Simple Convolutional Neural Network Models for MNIST Digit Recognition
Sanghyeon An
Min Jun Lee
Sanglee Park
H. Yang
Jungmin So
87
79
0
12 Aug 2020
HellaSwag: Can a Machine Really Finish Your Sentence?
HellaSwag: Can a Machine Really Finish Your Sentence?
Rowan Zellers
Ari Holtzman
Yonatan Bisk
Ali Farhadi
Yejin Choi
214
2,537
0
19 May 2019
Know What You Don't Know: Unanswerable Questions for SQuAD
Know What You Don't Know: Unanswerable Questions for SQuAD
Pranav Rajpurkar
Robin Jia
Percy Liang
RALMELM
324
2,858
0
11 Jun 2018
1