ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.00537
  4. Cited By
SuperGLUE: A Stickier Benchmark for General-Purpose Language
  Understanding Systems
v1v2v3 (latest)

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

2 May 2019
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
    ELM
ArXiv (abs)PDFHTML

Papers citing "SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems"

50 / 1,500 papers shown
Title
PolyLM: An Open Source Polyglot Large Language Model
PolyLM: An Open Source Polyglot Large Language Model
Xiangpeng Wei
Hao-Ran Wei
Huan Lin
Tianhao Li
Pei Zhang
...
Yu Bowen
Dayiheng Liu
Baosong Yang
Fei Huang
Jun Xie
LRM
107
61
0
12 Jul 2023
Empowering Cross-lingual Behavioral Testing of NLP Models with
  Typological Features
Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features
Ester Hlavnova
Sebastian Ruder
80
5
0
11 Jul 2023
Automated Essay Scoring in Argumentative Writing: DeBERTeachingAssistant
Automated Essay Scoring in Argumentative Writing: DeBERTeachingAssistant
Yann Hicke
Tonghua Tian
Karan Jha
Choong Hee Kim
59
2
0
09 Jul 2023
Exploring and Characterizing Large Language Models For Embedded System
  Development and Debugging
Exploring and Characterizing Large Language Models For Embedded System Development and Debugging
Zachary Englhardt
Rong-Hua Li
Dilini Nissanka
Zhihan Zhang
Girish Narayanswamy
Joseph Breda
Xin Liu
Shwetak N. Patel
Vikram Iyer
89
20
0
07 Jul 2023
BiPhone: Modeling Inter Language Phonetic Influences in Text
BiPhone: Modeling Inter Language Phonetic Influences in Text
Abhirut Gupta
Ananya B. Sai
R. Sproat
Yuri Vasilevski
James Ren
Ambarish Jash
Sukhdeep S. Sodhi
A. Raghuveer
30
0
0
06 Jul 2023
VideoGLUE: Video General Understanding Evaluation of Foundation Models
VideoGLUE: Video General Understanding Evaluation of Foundation Models
Liangzhe Yuan
N. B. Gundavarapu
Long Zhao
Hao Zhou
Huayu Chen
...
Florian Schroff
Hartwig Adam
Ming-Hsuan Yang
Ting Liu
Boqing Gong
ELM
85
10
0
06 Jul 2023
A Survey on Evaluation of Large Language Models
A Survey on Evaluation of Large Language Models
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELMLM&MAALM
223
1,759
0
06 Jul 2023
Text Alignment Is An Efficient Unified Model for Massive NLP Tasks
Text Alignment Is An Efficient Unified Model for Massive NLP Tasks
Yuheng Zha
Yichi Yang
Ruichen Li
Zhiting Hu
ALM
67
11
0
06 Jul 2023
Several categories of Large Language Models (LLMs): A Short Survey
Several categories of Large Language Models (LLMs): A Short Survey
Saurabh Pahune
Manoj Chandrasekharan
AILaw
47
17
0
05 Jul 2023
Won't Get Fooled Again: Answering Questions with False Premises
Won't Get Fooled Again: Answering Questions with False Premises
Shengding Hu
Yi-Xiao Luo
Huadong Wang
Xingyi Cheng
Zhiyuan Liu
Maosong Sun
90
29
0
05 Jul 2023
CARE-MI: Chinese Benchmark for Misinformation Evaluation in Maternity
  and Infant Care
CARE-MI: Chinese Benchmark for Misinformation Evaluation in Maternity and Infant Care
Tong Xiang
Liangzhi Li
Wangyue Li
Min‐Jun Bai
Lu Wei
Bowen Wang
Noa Garcia
77
5
0
04 Jul 2023
SMILE: Evaluation and Domain Adaptation for Social Media Language
  Understanding
SMILE: Evaluation and Domain Adaptation for Social Media Language Understanding
Vasilisa Bashlovkina
Riley Matthews
Zhaobin Kuang
Simon Baumgartner
Michael Bendersky
66
5
0
30 Jun 2023
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen
  LLMs
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
Lijun Yu
Yong Cheng
Zhiruo Wang
Vivek Kumar
Wolfgang Macherey
...
Yonatan Bisk
Ming-Hsuan Yang
Kevin Patrick Murphy
Alexander G. Hauptmann
Lu Jiang
MLLM
97
52
0
30 Jun 2023
Benchmarking Large Language Model Capabilities for Conditional
  Generation
Benchmarking Large Language Model Capabilities for Conditional Generation
Joshua Maynez
Priyanka Agrawal
Sebastian Gehrmann
ELMLM&MA
92
31
0
29 Jun 2023
Principles and Guidelines for Evaluating Social Robot Navigation
  Algorithms
Principles and Guidelines for Evaluating Social Robot Navigation Algorithms
Anthony G. Francis
Claudia Pérez-DÁrpino
Chengshu Li
Fei Xia
Alexandre Alahi
...
Xuesu Xiao
Peng Xu
Naoki Yokoyama
Alexander Toshev
Roberto Martin-Martin Logical Robotics
120
77
0
29 Jun 2023
ICSVR: Investigating Compositional and Syntactic Understanding in Video
  Retrieval Models
ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models
Avinash Madasu
Vasudev Lal
CoGe
91
3
0
28 Jun 2023
SCENEREPLICA: Benchmarking Real-World Robot Manipulation by Creating
  Replicable Scenes
SCENEREPLICA: Benchmarking Real-World Robot Manipulation by Creating Replicable Scenes
Ninad Khargonkar
Sai Haneesh Allu
Ya Lu
Jishnu Jaykumar
Balakrishnan Prabhakaran
Yu Xiang
38
2
0
27 Jun 2023
Kosmos-2: Grounding Multimodal Large Language Models to the World
Kosmos-2: Grounding Multimodal Large Language Models to the World
Zhiliang Peng
Wenhui Wang
Li Dong
Y. Hao
Shaohan Huang
Shuming Ma
Furu Wei
MLLMObjDVLM
128
765
0
26 Jun 2023
Large Language Models are Fixated by Red Herrings: Exploring Creative
  Problem Solving and Einstellung Effect using the Only Connect Wall Dataset
Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect using the Only Connect Wall Dataset
S. Naeini
Raeid Saqur
M. Saeidi
John Giorgi
Babak Taati
115
11
0
19 Jun 2023
MARBLE: Music Audio Representation Benchmark for Universal Evaluation
MARBLE: Music Audio Representation Benchmark for Universal Evaluation
Ruibin Yuan
Yi Ma
Yizhi Li
Ge Zhang
Xingran Chen
...
Si Liu
Shi Wang
Ruibo Liu
Yi-Ting Guo
Jie Fu
155
34
0
18 Jun 2023
ActiveGLAE: A Benchmark for Deep Active Learning with Transformers
ActiveGLAE: A Benchmark for Deep Active Learning with Transformers
Lukas Rauch
Matthias Aßenmacher
Denis Huseljic
Moritz Wirth
Bernd Bischl
Bernhard Sick
89
13
0
16 Jun 2023
Full Parameter Fine-tuning for Large Language Models with Limited
  Resources
Full Parameter Fine-tuning for Large Language Models with Limited Resources
Kai Lv
Yuqing Yang
Tengxiao Liu
Qi-jie Gao
Qipeng Guo
Xipeng Qiu
128
134
0
16 Jun 2023
Inverse Scaling: When Bigger Isn't Better
Inverse Scaling: When Bigger Isn't Better
I. R. McKenzie
Alexander Lyzhov
Michael Pieler
Alicia Parrish
Aaron Mueller
...
Yuhui Zhang
Zhengping Zhou
Najoung Kim
Sam Bowman
Ethan Perez
96
140
0
15 Jun 2023
Explore, Establish, Exploit: Red Teaming Language Models from Scratch
Explore, Establish, Exploit: Red Teaming Language Models from Scratch
Stephen Casper
Jason Lin
Joe Kwon
Gatlen Culp
Dylan Hadfield-Menell
AAML
53
99
0
15 Jun 2023
KoLA: Carefully Benchmarking World Knowledge of Large Language Models
KoLA: Carefully Benchmarking World Knowledge of Large Language Models
Jifan Yu
Xiaozhi Wang
Shangqing Tu
S. Cao
Daniel Zhang-Li
...
Lei Hou
Zhiyuan Liu
Bin Xu
Jie Tang
Juanzi Li
ELMALM
114
69
0
15 Jun 2023
CMMLU: Measuring massive multitask language understanding in Chinese
CMMLU: Measuring massive multitask language understanding in Chinese
Haonan Li
Yixuan Zhang
Fajri Koto
Yifei Yang
Hai Zhao
Yeyun Gong
Nan Duan
Tim Baldwin
ALMELM
116
273
0
15 Jun 2023
Opportunities for Large Language Models and Discourse in Engineering
  Design
Opportunities for Large Language Models and Discourse in Engineering Design
Jan Göpfert
J. Weinand
Patrick Kuckertz
D. Stolten
AI4CE
65
5
0
15 Jun 2023
SpeechGLUE: How Well Can Self-Supervised Speech Models Capture
  Linguistic Knowledge?
SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
Yusuke Ijima
Taichi Asami
Marc Delcroix
Yukinori Honma
SSLELM
79
11
0
14 Jun 2023
Operationalising Representation in Natural Language Processing
Operationalising Representation in Natural Language Processing
J. Harding
121
13
0
14 Jun 2023
Language models are not naysayers: An analysis of language models on
  negation benchmarks
Language models are not naysayers: An analysis of language models on negation benchmarks
Thinh Hung Truong
Timothy Baldwin
Karin Verspoor
Trevor Cohn
122
60
0
14 Jun 2023
NoCoLA: The Norwegian Corpus of Linguistic Acceptability
NoCoLA: The Norwegian Corpus of Linguistic Acceptability
Matias Jentoft
David Samuel
70
12
0
13 Jun 2023
Probing Quantifier Comprehension in Large Language Models: Another
  Example of Inverse Scaling
Probing Quantifier Comprehension in Large Language Models: Another Example of Inverse Scaling
Akshat Gupta
ELMLRM
63
7
0
12 Jun 2023
Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with
  Academic Compute
Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute
William Chen
Xuankai Chang
Yifan Peng
Zhaoheng Ni
Soumi Maiti
Shinji Watanabe
SSL
95
27
0
11 Jun 2023
GKD: A General Knowledge Distillation Framework for Large-scale
  Pre-trained Language Model
GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model
Shicheng Tan
Weng Lam Tam
Yuanchun Wang
Wenwen Gong
Yang Yang
...
Jiahao Liu
Jingang Wang
Shuo Zhao
Peng Zhang
Jie Tang
ALMMoE
80
13
0
11 Jun 2023
Are Intermediate Layers and Labels Really Necessary? A General Language
  Model Distillation Method
Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method
Shicheng Tan
Weng Lam Tam
Yuanchun Wang
Wenwen Gong
Shuo Zhao
Peng Zhang
Jie Tang
VLM
49
1
0
11 Jun 2023
M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining
  Large Language Models
M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models
Wenxuan Zhang
Sharifah Mahani Aljunied
Chang Gao
Yew Ken Chia
Lidong Bing
ELM
129
87
0
08 Jun 2023
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis,
  and LLMs Evaluations
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations
Lifan Yuan
Yangyi Chen
Ganqu Cui
Hongcheng Gao
Fangyuan Zou
Xingyi Cheng
Heng Ji
Zhiyuan Liu
Maosong Sun
144
84
0
07 Jun 2023
PromptRobust: Towards Evaluating the Robustness of Large Language Models
  on Adversarial Prompts
PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
Kaijie Zhu
Jindong Wang
Jiaheng Zhou
Zichen Wang
Hao Chen
...
Linyi Yang
Weirong Ye
Yue Zhang
Neil Zhenqiang Gong
Xingxu Xie
SILM
135
144
0
07 Jun 2023
Benchmarking Foundation Models with Language-Model-as-an-Examiner
Benchmarking Foundation Models with Language-Model-as-an-Examiner
Yushi Bai
Jiahao Ying
Yixin Cao
Xin Lv
Yuze He
...
Yijia Xiao
Haozhe Lyu
Jiayin Zhang
Juanzi Li
Lei Hou
ALMELM
107
149
0
07 Jun 2023
Model Spider: Learning to Rank Pre-Trained Models Efficiently
Model Spider: Learning to Rank Pre-Trained Models Efficiently
Yi-Kai Zhang
Ting Huang
Yao-Xiang Ding
De-Chuan Zhan
Han-Jia Ye
117
28
0
06 Jun 2023
Causal interventions expose implicit situation models for commonsense
  language understanding
Causal interventions expose implicit situation models for commonsense language understanding
Takateru Yamakoshi
James L. McClelland
A. Goldberg
Robert D. Hawkins
93
6
0
06 Jun 2023
An Approach to Solving the Abstraction and Reasoning Corpus (ARC)
  Challenge
An Approach to Solving the Abstraction and Reasoning Corpus (ARC) Challenge
Tan John Chong Min
RALMLRM
35
6
0
06 Jun 2023
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese
  Medical Exam Dataset
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset
Junling Liu
Peilin Zhou
Yining Hua
Dading Chong
Zhongyu Tian
...
Helin Wang
Chenyu You
Zhenhua Guo
Lei Zhu
Michael Lingzhi Li
LM&MAELM
111
79
0
05 Jun 2023
On "Scientific Debt" in NLP: A Case for More Rigour in Language Model
  Pre-Training Research
On "Scientific Debt" in NLP: A Case for More Rigour in Language Model Pre-Training Research
Made Nindyatama Nityasya
Haryo Akbarianto Wibowo
Alham Fikri Aji
Genta Indra Winata
Radityo Eko Prasojo
Phil Blunsom
A. Kuncoro
60
8
0
05 Jun 2023
LexGPT 0.1: pre-trained GPT-J models with Pile of Law
LexGPT 0.1: pre-trained GPT-J models with Pile of Law
Jieh-Sheng Lee
AILaw
61
11
0
05 Jun 2023
bgGLUE: A Bulgarian General Language Understanding Evaluation Benchmark
bgGLUE: A Bulgarian General Language Understanding Evaluation Benchmark
Momchil Hardalov
Pepa Atanasova
Todor Mihaylov
G. Angelova
K. Simov
P. Osenova
Ves Stoyanov
Ivan Koychev
Preslav Nakov
Dragomir R. Radev
ELMFedML
75
4
0
04 Jun 2023
MultiLegalPile: A 689GB Multilingual Legal Corpus
MultiLegalPile: A 689GB Multilingual Legal Corpus
Joel Niklaus
Veton Matoshi
Matthias Sturmer
Ilias Chalkidis
Daniel E. Ho
AILawELM
118
44
0
03 Jun 2023
Transfer learning for atomistic simulations using GNNs and kernel mean
  embeddings
Transfer learning for atomistic simulations using GNNs and kernel mean embeddings
Johannes Falk
L. Bonati
P. Novelli
Michele Parinello
Massimiliano Pontil
132
5
0
02 Jun 2023
Adapting Pre-trained Language Models to Vision-Language Tasks via
  Dynamic Visual Prompting
Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual Prompting
Shubin Huang
Qiong Wu
Yiyi Zhou
Weijie Chen
Rongsheng Zhang
Xiaoshuai Sun
Rongrong Ji
VLMVPVLMLRM
43
0
0
01 Jun 2023
Measuring the Robustness of NLP Models to Domain Shifts
Measuring the Robustness of NLP Models to Domain Shifts
Nitay Calderon
Naveh Porat
Eyal Ben-David
Alexander Chapanin
Zorik Gekhman
Nadav Oved
Vitaly Shalumov
Roi Reichart
124
8
0
31 May 2023
Previous
123...121314...282930
Next