ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.00537
  4. Cited By
SuperGLUE: A Stickier Benchmark for General-Purpose Language
  Understanding Systems
v1v2v3 (latest)

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

2 May 2019
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
    ELM
ArXiv (abs)PDFHTML

Papers citing "SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems"

50 / 1,500 papers shown
Title
PaLM 2 Technical Report
PaLM 2 Technical Report
Rohan Anil
Andrew M. Dai
Orhan Firat
Melvin Johnson
Dmitry Lepikhin
...
Ce Zheng
Wei Zhou
Denny Zhou
Slav Petrov
Yonghui Wu
ReLMLRM
260
1,209
0
17 May 2023
Towards More Robust NLP System Evaluation: Handling Missing Scores in
  Benchmarks
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks
Anas Himmi
Ekhine Irurozki
Nathan Noiry
Stephan Clémençon
Pierre Colombo
193
9
0
17 May 2023
M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark
  for Chinese Large Language Models
M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark for Chinese Large Language Models
Chuang Liu
Renren Jin
Yuqi Ren
Linhao Yu
Tianyu Dong
...
Peiyi Zhang
Qingqing Lyu
Xiaowen Su
Qun Liu
Deyi Xiong
ELMALM
119
26
0
17 May 2023
AD-KD: Attribution-Driven Knowledge Distillation for Language Model
  Compression
AD-KD: Attribution-Driven Knowledge Distillation for Language Model Compression
Siyue Wu
Hongzhan Chen
Xiaojun Quan
Qifan Wang
Rui Wang
VLM
86
20
0
17 May 2023
DLUE: Benchmarking Document Language Understanding
DLUE: Benchmarking Document Language Understanding
Ruoxi Xu
Hongyu Lin
Xinyan Guan
Xianpei Han
Yingfei Sun
Le Sun
ELM
80
0
0
16 May 2023
Maybe Only 0.5% Data is Needed: A Preliminary Exploration of Low
  Training Data Instruction Tuning
Maybe Only 0.5% Data is Needed: A Preliminary Exploration of Low Training Data Instruction Tuning
Haowen Chen
Yiming Zhang
Qi Zhang
Hantao Yang
Xiaomeng Hu
Xuetao Ma
Yifan YangGong
Jiaqi Zhao
ALM
107
51
0
16 May 2023
It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and
  Measurements of Performance
It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and Measurements of Performance
Arjun Subramonian
Xingdi Yuan
Hal Daumé
Su Lin Blodgett
95
18
0
15 May 2023
What's the Meaning of Superhuman Performance in Today's NLU?
What's the Meaning of Superhuman Performance in Today's NLU?
Simone Tedeschi
Johan Bos
T. Declerck
Jan Hajic
Daniel Hershcovich
...
Simon Krek
Steven Schockaert
Rico Sennrich
Ekaterina Shutova
Roberto Navigli
ELMLM&MAVLMReLMLRM
96
27
0
15 May 2023
Symbol tuning improves in-context learning in language models
Symbol tuning improves in-context learning in language models
Jerry W. Wei
Le Hou
Andrew Kyle Lampinen
Xiangning Chen
Da Huang
...
Xinyun Chen
Yifeng Lu
Denny Zhou
Tengyu Ma
Quoc V. Le
LRM
90
80
0
15 May 2023
STORYWARS: A Dataset and Instruction Tuning Baselines for Collaborative
  Story Understanding and Generation
STORYWARS: A Dataset and Instruction Tuning Baselines for Collaborative Story Understanding and Generation
Yulun Du
Lydia B. Chilton
88
8
0
14 May 2023
Learning to Reason over Scene Graphs: A Case Study of Finetuning GPT-2
  into a Robot Language Model for Grounded Task Planning
Learning to Reason over Scene Graphs: A Case Study of Finetuning GPT-2 into a Robot Language Model for Grounded Task Planning
Georgia Chalvatzaki
A. Younes
Daljeet Nandha
An T. Le
Leonardo F. R. Ribeiro
Iryna Gurevych
LM&RoLRMLLMAG
114
31
0
12 May 2023
When Giant Language Brains Just Aren't Enough! Domain Pizzazz with
  Knowledge Sparkle Dust
When Giant Language Brains Just Aren't Enough! Domain Pizzazz with Knowledge Sparkle Dust
Minh Le Nguyen
Duy-Hung Nguyen
Shahab Sabahi
Hung Le
Jeffrey Yang
Hajime Hotta
83
1
0
12 May 2023
Advancing Neural Encoding of Portuguese with Transformer Albertina PT-*
Advancing Neural Encoding of Portuguese with Transformer Albertina PT-*
João Rodrigues
Luís Gomes
Joao Silva
António Branco
Rodrigo Santos
Henrique Lopes Cardoso
T. Osório
38
44
0
11 May 2023
GeoGLUE: A GeoGraphic Language Understanding Evaluation Benchmark
GeoGLUE: A GeoGraphic Language Understanding Evaluation Benchmark
Dongyang Li
Ruixue Ding
Qiang-Wei Zhang
Zheng Li
Boli Chen
...
Yao Xu
Xin Li
Ning Guo
Fei Huang
Xiaofeng He
ELMVLM
65
6
0
11 May 2023
ANALOGICAL -- A Novel Benchmark for Long Text Analogy Evaluation in
  Large Language Models
ANALOGICAL -- A Novel Benchmark for Long Text Analogy Evaluation in Large Language Models
Thilini Wijesiriwardene
Ruwan Wickramarachchi
Bimal Gajera
Shreeyash Mukul Gowaikar
Chandan Gupta
Aman Chadha
Aishwarya N. Reganti
Amit P. Sheth
Amitava Das
ELM
81
14
0
08 May 2023
ComputeGPT: A computational chat model for numerical problems
ComputeGPT: A computational chat model for numerical problems
Ryan H. Lewis
Junfeng Jiao
27
2
0
08 May 2023
Knowledge Graph Guided Semantic Evaluation of Language Models For User
  Trust
Knowledge Graph Guided Semantic Evaluation of Language Models For User Trust
Kaushik Roy
Tarun Garg
Vedant Palit
Yuxin Zi
Vignesh Narayanan
Amit P. Sheth
67
7
0
08 May 2023
The Current State of Summarization
The Current State of Summarization
Fabian Retkowski
78
6
0
08 May 2023
The Best Defense is Attack: Repairing Semantics in Textual Adversarial
  Examples
The Best Defense is Attack: Repairing Semantics in Textual Adversarial Examples
Heng Yang
Ke Li
AAML
114
3
0
06 May 2023
Refining the Responses of LLMs by Themselves
Refining the Responses of LLMs by Themselves
Tianqiang Yan
Tiansheng Xu
51
3
0
06 May 2023
Residual Prompt Tuning: Improving Prompt Tuning with Residual
  Reparameterization
Residual Prompt Tuning: Improving Prompt Tuning with Residual Reparameterization
Anastasia Razdaibiedina
Yuning Mao
Rui Hou
Madian Khabsa
M. Lewis
Jimmy Ba
Amjad Almahairi
VLM
79
51
0
06 May 2023
NorBench -- A Benchmark for Norwegian Language Models
NorBench -- A Benchmark for Norwegian Language Models
David Samuel
Andrey Kutuzov
Samia Touileb
Erik Velldal
Lilja Ovrelid
Egil Rønningstad
Elina Sigdel
Anna Palatkina
93
25
0
06 May 2023
Neuromodulation Gated Transformer
Neuromodulation Gated Transformer
Kobe Knowles
Joshua Bensemann
Diana Benavides-Prado
Vithya Yogarajan
Michael Witbrock
Gillian Dobbie
Yang Chen
58
0
0
05 May 2023
AttentionViz: A Global View of Transformer Attention
AttentionViz: A Global View of Transformer Attention
Catherine Yeh
Yida Chen
Aoyu Wu
Cynthia Chen
Fernanda Viégas
Martin Wattenberg
ViT
79
55
0
04 May 2023
PTP: Boosting Stability and Performance of Prompt Tuning with
  Perturbation-Based Regularizer
PTP: Boosting Stability and Performance of Prompt Tuning with Perturbation-Based Regularizer
Lichang Chen
Heng-Chiao Huang
Varun Madhavan
AAML
176
12
0
03 May 2023
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages
Erik Nijkamp
A. Ghobadzadeh
Caiming Xiong
Silvio Savarese
Yingbo Zhou
237
174
0
03 May 2023
ChatLog: Carefully Evaluating the Evolution of ChatGPT Across Time
ChatLog: Carefully Evaluating the Evolution of ChatGPT Across Time
Shangqing Tu
Chunyang Li
Jifan Yu
Xiaozhi Wang
Lei Hou
Juanzi Li
LLMAGAI4MH
158
10
0
27 Apr 2023
Boosting Big Brother: Attacking Search Engines with Encodings
Boosting Big Brother: Attacking Search Engines with Encodings
Nicholas Boucher
Luca Pajola
Ilia Shumailov
Ross J. Anderson
Mauro Conti
SILM
70
10
0
27 Apr 2023
Evaluation of GPT-3.5 and GPT-4 for supporting real-world information
  needs in healthcare delivery
Evaluation of GPT-3.5 and GPT-4 for supporting real-world information needs in healthcare delivery
Debadutta Dash
Rahul Thapa
Juan M. Banda
Akshay Swaminathan
Morgan Cheatham
...
Garret K. Morris
H. Magon
M. Lungren
Eric Horvitz
N. Shah
ELMLM&MAAI4MH
137
52
0
26 Apr 2023
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Jingfeng Yang
Hongye Jin
Ruixiang Tang
Xiaotian Han
Qizhang Feng
Haoming Jiang
Bing Yin
Helen Zhou
LM&MA
214
682
0
26 Apr 2023
Introducing MBIB -- the first Media Bias Identification Benchmark Task
  and Dataset Collection
Introducing MBIB -- the first Media Bias Identification Benchmark Task and Dataset Collection
Martin Wessel
Tomávs Horych
Terry Ruas
Akiko Aizawa
Bela Gipp
Timo Spinde
83
25
0
25 Apr 2023
Domain Mastery Benchmark: An Ever-Updating Benchmark for Evaluating Holistic Domain Knowledge of Large Language Model--A Preliminary Release
Zhouhong Gu
Xiaoxuan Zhu
Haoning Ye
Lin Zhang
Zhuozhi Xiong
Zihan Li
Qi He
Sihang Jiang
Hongwei Feng
Yanghua Xiao
ELMALM
74
2
0
23 Apr 2023
SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using
  Vision-Language Models
SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models
Jonathan Roberts
Kai Han
Samuel Albanie
VLM
94
14
0
23 Apr 2023
LaMP: When Large Language Models Meet Personalization
LaMP: When Large Language Models Meet Personalization
Alireza Salemi
Sheshera Mysore
Michael Bendersky
Hamed Zamani
RALM
127
240
0
22 Apr 2023
MixPro: Simple yet Effective Data Augmentation for Prompt-based Learning
MixPro: Simple yet Effective Data Augmentation for Prompt-based Learning
Bohan Li
Longxu Dou
Yutai Hou
Yunlong Feng
Honglin Mu
Qingfu Zhu
Qinghua Sun
Wanxiang Che
VLM
74
4
0
19 Apr 2023
Exploring the Trade-Offs: Unified Large Language Models vs Local
  Fine-Tuned Models for Highly-Specific Radiology NLI Task
Exploring the Trade-Offs: Unified Large Language Models vs Local Fine-Tuned Models for Highly-Specific Radiology NLI Task
Zihao Wu
Lu Zhang
Chao-Yang Cao
Xiao-Xing Yu
Haixing Dai
...
Quanzheng Li
Dinggang Shen
Xiang Li
Dajiang Zhu
Tianming Liu
LM&MA
66
39
0
18 Apr 2023
Tool Learning with Foundation Models
Tool Learning with Foundation Models
Yujia Qin
Shengding Hu
Yankai Lin
Weize Chen
Ning Ding
...
Cheng Yang
Tongshuang Wu
Heng Ji
Zhiyuan Liu
Maosong Sun
146
222
0
17 Apr 2023
Dialogue Games for Benchmarking Language Understanding: Motivation,
  Taxonomy, Strategy
Dialogue Games for Benchmarking Language Understanding: Motivation, Taxonomy, Strategy
David Schlangen
ELM
83
15
0
14 Apr 2023
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
Wanjun Zhong
Ruixiang Cui
Yiduo Guo
Yaobo Liang
Shuai Lu
Yanlin Wang
Amin Saied
Weizhu Chen
Nan Duan
ALMELM
135
550
0
13 Apr 2023
Global Prompt Cell: A Portable Control Module for Effective Prompt
  Tuning
Global Prompt Cell: A Portable Control Module for Effective Prompt Tuning
Chi-Liang Liu
Hao Wang
Nuwa Xi
Sendong Zhao
Bing Qin
VLM
69
1
0
12 Apr 2023
FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via
  Dynamic Device Placement
FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement
Xiaonan Nie
Xupeng Miao
Zilong Wang
Zichao Yang
Jilong Xue
Lingxiao Ma
Gang-Ming Cao
Tengjiao Wang
MoE
89
50
0
08 Apr 2023
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards
  and Ethical Behavior in the MACHIAVELLI Benchmark
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Alexander Pan
Chan Jun Shern
Andy Zou
Nathaniel Li
Steven Basart
Thomas Woodside
Jonathan Ng
Hanlin Zhang
Scott Emmons
Dan Hendrycks
113
136
0
06 Apr 2023
Static Fuzzy Bag-of-Words: a lightweight sentence embedding algorithm
Static Fuzzy Bag-of-Words: a lightweight sentence embedding algorithm
Matteo Muffo
Roberto Tedesco
L. Sbattella
Vincenzo Scotti
13
0
0
06 Apr 2023
Evaluating GPT-4 and ChatGPT on Japanese Medical Licensing Examinations
Evaluating GPT-4 and ChatGPT on Japanese Medical Licensing Examinations
Jungo Kasai
Y. Kasai
Keisuke Sakaguchi
Yutaro Yamada
Dragomir R. Radev
LM&MAELM
66
107
0
31 Mar 2023
AnnoLLM: Making Large Language Models to Be Better Crowdsourced
  Annotators
AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators
Xingwei He
Zheng-Wen Lin
Yeyun Gong
Alex Jin
Hang Zhang
Chen Lin
Jian Jiao
Siu-Ming Yiu
Nan Duan
Weizhu Chen
119
201
0
29 Mar 2023
Soft-prompt tuning to predict lung cancer using primary care free-text
  Dutch medical notes
Soft-prompt tuning to predict lung cancer using primary care free-text Dutch medical notes
Auke Elfrink
Iacopo Vagliano
A. Abu-Hanna
Iacer Calixto
57
5
0
28 Mar 2023
Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning
Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning
Vladislav Lialin
Vijeta Deshpande
Anna Rumshisky
104
179
0
28 Mar 2023
Learning Expressive Prompting With Residuals for Vision Transformers
Learning Expressive Prompting With Residuals for Vision Transformers
Rajshekhar Das
Yonatan Dukler
Avinash Ravichandran
A. Swaminathan
VLMVPVLM
71
22
0
27 Mar 2023
Natural Language Reasoning, A Survey
Natural Language Reasoning, A Survey
Fei Yu
Hongbo Zhang
Prayag Tiwari
Benyou Wang
ReLMLRM
171
63
0
26 Mar 2023
Task-oriented Memory-efficient Pruning-Adapter
Task-oriented Memory-efficient Pruning-Adapter
Guorun Wang
Jun Yang
Yaoru Sun
44
4
0
26 Mar 2023
Previous
123...141516...282930
Next