Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.08414
Cited By
What's the Meaning of Superhuman Performance in Today's NLU?
15 May 2023
Simone Tedeschi
Johan Bos
T. Declerck
Jan Hajic
Daniel Hershcovich
Eduard H. Hovy
Alexander Koller
Simon Krek
Steven Schockaert
Rico Sennrich
Ekaterina Shutova
Roberto Navigli
ELM
LM&MA
VLM
ReLM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"What's the Meaning of Superhuman Performance in Today's NLU?"
20 / 20 papers shown
Title
The Call for Socially Aware Language Technologies
Diyi Yang
Dirk Hovy
David Jurgens
Barbara Plank
VLM
61
11
0
24 Feb 2025
A Survey of Text Classification Under Class Distribution Shift
Adriana Valentina Costache
Silviu Florin Gheorghe
Eduard Poesina
Paul Irofti
Radu Tudor Ionescu
OOD
VLM
60
0
0
18 Feb 2025
A linguistically-motivated evaluation methodology for unraveling model's abilities in reading comprehension tasks
Elie Antoine
Frédéric Béchet
Géraldine Damnati
Philippe Langlais
56
1
0
29 Jan 2025
BQA: Body Language Question Answering Dataset for Video Large Language Models
Shintaro Ozaki
Kazuki Hayashi
Miyu Oba
Yusuke Sakai
Hidetaka Kamigaito
Taro Watanabe
43
1
0
17 Oct 2024
On the Evaluation Practices in Multilingual NLP: Can Machine Translation Offer an Alternative to Human Translations?
Rochelle Choenni
Sara Rajaee
Christof Monz
Ekaterina Shutova
39
1
0
20 Jun 2024
mCSQA: Multilingual Commonsense Reasoning Dataset with Unified Creation Strategy by Language Models and Humans
Yusuke Sakai
Hidetaka Kamigaito
Taro Watanabe
LRM
46
3
0
06 Jun 2024
ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models
Aparna Elangovan
Ling Liu
Lei Xu
S. Bodapati
Dan Roth
ELM
30
9
0
28 May 2024
Natural Language Processing RELIES on Linguistics
Juri Opitz
Shira Wein
Nathan Schneider
AI4CE
55
7
0
09 May 2024
PATCH -- Psychometrics-AssisTed benCHmarking of Large Language Models: A Case Study of Mathematics Proficiency
Qixiang Fang
Daniel L. Oberski
Dong Nguyen
38
3
0
02 Apr 2024
Predictions from language models for multiple-choice tasks are not robust under variation of scoring methods
Polina Tsvilodub
Hening Wang
Sharon Grosch
Michael Franke
35
8
0
01 Mar 2024
How the Advent of Ubiquitous Large Language Models both Stymie and Turbocharge Dynamic Adversarial Question Generation
Yoo Yeon Sung
Ishani Mondal
Jordan L. Boyd-Graber
30
0
0
20 Jan 2024
Establishing Trustworthiness: Rethinking Tasks and Model Evaluation
Robert Litschko
Max Müller-Eberstein
Rob van der Goot
Leon Weber
Barbara Plank
LRM
21
2
0
09 Oct 2023
BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation Suite for Large Language Models
Wei Qi Leong
Jian Gang Ngui
Yosephine Susanto
Hamsawardhini Rengarajan
Kengatharaiyer Sarveswaran
William-Chandra Tjhi
29
9
0
12 Sep 2023
Active Learning Principles for In-Context Learning with Large Language Models
Katerina Margatina
Timo Schick
Nikolaos Aletras
Jane Dwivedi-Yu
30
39
0
23 May 2023
DynaSent: A Dynamic Benchmark for Sentiment Analysis
Christopher Potts
Zhengxuan Wu
Atticus Geiger
Douwe Kiela
230
77
0
30 Dec 2020
To Test Machine Comprehension, Start by Defining Comprehension
Jesse Dunietz
Greg Burnham
Akash Bharadwaj
Owen Rambow
Jennifer Chu-Carroll
D. Ferrucci
FaML
54
65
0
04 May 2020
Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets
Mor Geva
Yoav Goldberg
Jonathan Berant
242
320
0
21 Aug 2019
e-SNLI: Natural Language Inference with Natural Language Explanations
Oana-Maria Camburu
Tim Rocktaschel
Thomas Lukasiewicz
Phil Blunsom
LRM
263
622
0
04 Dec 2018
Hypothesis Only Baselines in Natural Language Inference
Adam Poliak
Jason Naradowsky
Aparajita Haldar
Rachel Rudinger
Benjamin Van Durme
190
576
0
02 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,984
0
20 Apr 2018
1