Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.16966
Cited By
Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks
25 April 2024
Melissa Ailem
Katerina Marazopoulou
Charlotte Siska
James Bono
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks"
14 / 14 papers shown
Title
LLM-Evaluation Tropes: Perspectives on the Validity of LLM-Evaluations
Laura Dietz
Oleg Zendel
P. Bailey
Charles L. A. Clarke
Ellese Cotterill
Jeff Dalton
Faegheh Hasibi
Mark Sanderson
Nick Craswell
ELM
43
0
0
27 Apr 2025
Adapting Large Language Models for Multi-Domain Retrieval-Augmented-Generation
Alexandre Misrahi
Nadezhda Chirkova
Maxime Louis
Vassilina Nikoulina
RALM
82
0
0
03 Apr 2025
SelfPrompt: Autonomously Evaluating LLM Robustness via Domain-Constrained Knowledge Guidelines and Refined Adversarial Prompts
Aihua Pei
Zehua Yang
Shunan Zhu
Ruoxi Cheng
Ju Jia
AAML
72
2
0
01 Dec 2024
Shortcut Learning in In-Context Learning: A Survey
Rui Song
Yingji Li
Fausto Giunchiglia
Fausto Giunchiglia
Hao Xu
38
1
0
04 Nov 2024
Language Model Preference Evaluation with Multiple Weak Evaluators
Zhengyu Hu
Jieyu Zhang
Zhihan Xiong
Alexander Ratner
Hui Xiong
Ranjay Krishna
46
3
0
14 Oct 2024
100 instances is all you need: predicting the success of a new LLM on unseen data by testing on a few instances
Lorenzo Pacchiardi
Lucy G. Cheke
José Hernández Orallo
ALM
LRM
ELM
36
3
0
05 Sep 2024
On-Device Language Models: A Comprehensive Review
Jiajun Xu
Zhiyuan Li
Wei Chen
Qun Wang
Xin Gao
Qi Cai
Ziyuan Ling
42
27
0
26 Aug 2024
What Did I Do Wrong? Quantifying LLMs' Sensitivity and Consistency to Prompt Engineering
Federico Errica
G. Siracusano
D. Sanvito
Roberto Bifulco
76
19
0
18 Jun 2024
KGPA: Robustness Evaluation for Large Language Models via Cross-Domain Knowledge Graphs
Aihua Pei
Zehua Yang
Shunan Zhu
Ruoxi Cheng
Ju Jia
Lina Wang
39
1
0
16 Jun 2024
Don't Make Your LLM an Evaluation Benchmark Cheater
Kun Zhou
Yutao Zhu
Zhipeng Chen
Wentong Chen
Wayne Xin Zhao
Xu Chen
Yankai Lin
Ji-Rong Wen
Jiawei Han
ELM
105
136
0
03 Nov 2023
Competence-Based Analysis of Language Models
Adam Davies
Jize Jiang
Chengxiang Zhai
ELM
24
4
0
01 Mar 2023
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
250
1,073
0
05 Oct 2022
Measure and Improve Robustness in NLP Models: A Survey
Xuezhi Wang
Haohan Wang
Diyi Yang
139
130
0
15 Dec 2021
Robustness Gym: Unifying the NLP Evaluation Landscape
Karan Goel
Nazneen Rajani
Jesse Vig
Samson Tan
Jason M. Wu
Stephan Zheng
Caiming Xiong
Mohit Bansal
Christopher Ré
AAML
OffRL
OOD
146
136
0
13 Jan 2021
1