ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.10425
  4. Cited By
Human vs. Muppet: A Conservative Estimate of Human Performance on the
  GLUE Benchmark

Human vs. Muppet: A Conservative Estimate of Human Performance on the GLUE Benchmark

24 May 2019
Nikita Nangia
Samuel R. Bowman
    ELM
    ALM
ArXivPDFHTML

Papers citing "Human vs. Muppet: A Conservative Estimate of Human Performance on the GLUE Benchmark"

42 / 42 papers shown
Title
TLoRA: Tri-Matrix Low-Rank Adaptation of Large Language Models
TLoRA: Tri-Matrix Low-Rank Adaptation of Large Language Models
Tanvir Islam
AI4CE
50
0
0
25 Apr 2025
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Alex Warstadt
Aaron Mueller
Leshem Choshen
E. Wilcox
Chengxu Zhuang
...
Rafael Mosquera
Bhargavi Paranjape
Adina Williams
Tal Linzen
Ryan Cotterell
43
108
0
10 Apr 2025
Neuro-Symbolic Contrastive Learning for Cross-domain Inference
Neuro-Symbolic Contrastive Learning for Cross-domain Inference
Mingyue Liu
Ryo Ueda
Zhen Wan
Katsumi Inoue
Chris G. Willcocks
NAI
72
0
0
13 Feb 2025
Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on
  Developmentally Plausible Corpora
Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Michael Y. Hu
Aaron Mueller
Candace Ross
Adina Williams
Tal Linzen
Chengxu Zhuang
Ryan Cotterell
Leshem Choshen
Alex Warstadt
Ethan Gotlieb Wilcox
99
7
0
06 Dec 2024
RuBLiMP: Russian Benchmark of Linguistic Minimal Pairs
RuBLiMP: Russian Benchmark of Linguistic Minimal Pairs
Ekaterina Taktasheva
Maxim Bazhukov
Kirill Koncha
Alena Fenogenova
Ekaterina Artemova
Vladislav Mikhailov
42
9
0
27 Jun 2024
What Makes Language Models Good-enough?
What Makes Language Models Good-enough?
Daiki Asami
Saku Sugawara
37
1
0
06 Jun 2024
A synthetic data approach for domain generalization of NLI models
A synthetic data approach for domain generalization of NLI models
Mohammad Javad Hosseini
Andrey Petrov
Alex Fabrikant
Annie Louis
SyDa
38
8
0
19 Feb 2024
The Case for Scalable, Data-Driven Theory: A Paradigm for Scientific
  Progress in NLP
The Case for Scalable, Data-Driven Theory: A Paradigm for Scientific Progress in NLP
Julian Michael
19
1
0
01 Dec 2023
Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs
  "Difficult" Downstream Tasks in LLMs
Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs
Lu Yin
Ajay Jaiswal
Shiwei Liu
Souvik Kundu
Zhangyang Wang
27
7
0
29 Sep 2023
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122
  Language Variants
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants
Lucas Bandarkar
Davis Liang
Benjamin Muller
Mikel Artetxe
Satya Narayan Shukla
Don Husa
Naman Goyal
Abhinandan Krishnan
Luke Zettlemoyer
Madian Khabsa
30
133
0
31 Aug 2023
Foundation Model-oriented Robustness: Robust Image Model Evaluation with
  Pretrained Models
Foundation Model-oriented Robustness: Robust Image Model Evaluation with Pretrained Models
Peiyan Zhang
Hao Liu
Chaozhuo Li
Xing Xie
Sunghun Kim
Haohan Wang
VLM
OOD
32
8
0
21 Aug 2023
What's the Meaning of Superhuman Performance in Today's NLU?
What's the Meaning of Superhuman Performance in Today's NLU?
Simone Tedeschi
Johan Bos
T. Declerck
Jan Hajic
Daniel Hershcovich
...
Simon Krek
Steven Schockaert
Rico Sennrich
Ekaterina Shutova
Roberto Navigli
ELM
LM&MA
VLM
ReLM
LRM
39
26
0
15 May 2023
Are Machine Rationales (Not) Useful to Humans? Measuring and Improving
  Human Utility of Free-Text Rationales
Are Machine Rationales (Not) Useful to Humans? Measuring and Improving Human Utility of Free-Text Rationales
Brihi Joshi
Ziyi Liu
Sahana Ramnath
Aaron Chan
Zhewei Tong
Shaoliang Nie
Qifan Wang
Yejin Choi
Xiang Ren
HAI
LRM
34
29
0
11 May 2023
A Human Subject Study of Named Entity Recognition (NER) in
  Conversational Music Recommendation Queries
A Human Subject Study of Named Entity Recognition (NER) in Conversational Music Recommendation Queries
Elena V. Epure
Romain Hennequin
16
5
0
13 Mar 2023
A Challenging Benchmark for Low-Resource Learning
A Challenging Benchmark for Low-Resource Learning
Yudong Wang
Chang Ma
Qingxiu Dong
Lingpeng Kong
Jingjing Xu
64
3
0
07 Mar 2023
RuCoLA: Russian Corpus of Linguistic Acceptability
RuCoLA: Russian Corpus of Linguistic Acceptability
Vladislav Mikhailov
T. Shamardina
Max Ryabinin
A. Pestova
I. Smurov
Ekaterina Artemova
30
28
0
23 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review
State-of-the-art generalisation research in NLP: A taxonomy and review
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Ryan Cotterell
Zhijing Jin
127
94
0
06 Oct 2022
HumanAL: Calibrating Human Matching Beyond a Single Task
HumanAL: Calibrating Human Matching Beyond a Single Task
Roee Shraga
HAI
19
6
0
06 May 2022
Testing the limits of natural language models for predicting human
  language judgments
Testing the limits of natural language models for predicting human language judgments
Tal Golan
Matthew Siegelman
N. Kriegeskorte
Christopher A. Baldassano
22
15
0
07 Apr 2022
NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual
  Sentiment Analysis
NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis
Shamsuddeen Hassan Muhammad
David Ifeoluwa Adelani
Sebastian Ruder
I. Ahmad
Idris Abdulmumin
...
Chris C. Emezue
Saheed Abdul
Anuoluwapo Aremu
Alipio Jeorge
P. Brazdil
45
96
0
20 Jan 2022
The Defeat of the Winograd Schema Challenge
The Defeat of the Winograd Schema Challenge
Vid Kocijan
E. Davis
Thomas Lukasiewicz
G. Marcus
L. Morgenstern
31
40
0
07 Jan 2022
How not to Lie with a Benchmark: Rearranging NLP Leaderboards
How not to Lie with a Benchmark: Rearranging NLP Leaderboards
Tatiana Shavrina
Valentin Malykh
ALM
ELM
423
10
0
02 Dec 2021
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
  Language Models
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models
Wei Ping
Chejian Xu
Shuohang Wang
Zhe Gan
Yu Cheng
Jianfeng Gao
Ahmed Hassan Awadallah
Yangqiu Song
VLM
ELM
AAML
33
215
0
04 Nov 2021
CLUES: Few-Shot Learning Evaluation in Natural Language Understanding
CLUES: Few-Shot Learning Evaluation in Natural Language Understanding
Subhabrata Mukherjee
Xiaodong Liu
Guoqing Zheng
Saghar Hosseini
Hao Cheng
Greg Yang
Christopher Meek
Ahmed Hassan Awadallah
Jianfeng Gao
ELM
33
11
0
04 Nov 2021
IndoNLI: A Natural Language Inference Dataset for Indonesian
IndoNLI: A Natural Language Inference Dataset for Indonesian
Rahmad Mahendra
Alham Fikri Aji
Samuel Louvan
Fahrurrozi Rahman
Clara Vania
26
29
0
27 Oct 2021
Investigating Transfer Learning in Multilingual Pre-trained Language
  Models through Chinese Natural Language Inference
Investigating Transfer Learning in Multilingual Pre-trained Language Models through Chinese Natural Language Inference
Hai Hu
He Zhou
Zuoyu Tian
Yiwen Zhang
Yina Ma
Yanting Li
Yixin Nie
Kyle Richardson
27
11
0
07 Jun 2021
Comparing Test Sets with Item Response Theory
Comparing Test Sets with Item Response Theory
Clara Vania
Phu Mon Htut
William Huang
Dhara Mungra
Richard Yuanzhe Pang
Jason Phang
Haokun Liu
Kyunghyun Cho
Sam Bowman
24
39
0
01 Jun 2021
KLUE: Korean Language Understanding Evaluation
KLUE: Korean Language Understanding Evaluation
Sungjoon Park
Jihyung Moon
Sungdong Kim
Won Ik Cho
Jiyoon Han
...
Seonghyun Kim
Lucy Park
Alice Oh
Jung-Woo Ha
Kyunghyun Cho
ELM
VLM
29
191
0
20 May 2021
Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian
  SuperGLUE Tasks
Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks
Tatiana Iazykova
Denis Kapelyushnik
Olga Bystrova
Andrey Kutuzov
ELM
8
1
0
03 May 2021
Sensitivity as a Complexity Measure for Sequence Classification Tasks
Sensitivity as a Complexity Measure for Sequence Classification Tasks
Michael Hahn
Dan Jurafsky
Richard Futrell
150
22
0
21 Apr 2021
What Will it Take to Fix Benchmarking in Natural Language Understanding?
What Will it Take to Fix Benchmarking in Natural Language Understanding?
Samuel R. Bowman
George E. Dahl
ELM
ALM
30
156
0
05 Apr 2021
OCNLI: Original Chinese Natural Language Inference
OCNLI: Original Chinese Natural Language Inference
Hai Hu
Kyle Richardson
Liang Xu
Lu Li
Sandra Kübler
L. Moss
33
118
0
12 Oct 2020
What Can We Learn from Collective Human Opinions on Natural Language
  Inference Data?
What Can We Learn from Collective Human Opinions on Natural Language Inference Data?
Yixin Nie
Xiang Zhou
Joey Tianyi Zhou
26
129
0
07 Oct 2020
How Can We Accelerate Progress Towards Human-like Linguistic
  Generalization?
How Can We Accelerate Progress Towards Human-like Linguistic Generalization?
Tal Linzen
220
189
0
03 May 2020
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating
  Cross-lingual Generalization
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
Junjie Hu
Sebastian Ruder
Aditya Siddhant
Graham Neubig
Orhan Firat
Melvin Johnson
ELM
71
955
0
24 Mar 2020
What Does My QA Model Know? Devising Controlled Probes using Expert
  Knowledge
What Does My QA Model Know? Devising Controlled Probes using Expert Knowledge
Kyle Richardson
Ashish Sabharwal
27
45
0
31 Dec 2019
Learning to Learn Words from Visual Scenes
Learning to Learn Words from Visual Scenes
Dídac Surís
Dave Epstein
Heng Ji
Shih-Fu Chang
Carl Vondrick
VLM
CLIP
SSL
OffRL
24
4
0
25 Nov 2019
BERTs of a feather do not generalize together: Large variability in
  generalization across models with similar test set performance
BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance
R. Thomas McCoy
Junghyun Min
Tal Linzen
29
147
0
07 Nov 2019
A Pragmatics-Centered Evaluation Framework for Natural Language
  Understanding
A Pragmatics-Centered Evaluation Framework for Natural Language Understanding
Damien Sileo
Tim Van de Cruys
Camille Pradel
Philippe Muller
ELM
17
3
0
19 Jul 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language
  Understanding Systems
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
49
2,256
0
02 May 2019
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural
  Language Inference
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
R. Thomas McCoy
Ellie Pavlick
Tal Linzen
48
1,212
0
04 Feb 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,984
0
20 Apr 2018
1