ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.00537
  4. Cited By
SuperGLUE: A Stickier Benchmark for General-Purpose Language
  Understanding Systems

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

2 May 2019
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
    ELM
ArXivPDFHTML

Papers citing "SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems"

50 / 497 papers shown
Title
Active Learning of Robot Vision Using Adaptive Path Planning
Active Learning of Robot Vision Using Adaptive Path Planning
Julius Ruckin
Federico Magistri
Cyrill Stachniss
Marija Popović
SSL
26
0
0
14 Oct 2024
ELICIT: LLM Augmentation via External In-Context Capability
ELICIT: LLM Augmentation via External In-Context Capability
Futing Wang
Jianhao Yan
Yue Zhang
Tao Lin
44
0
0
12 Oct 2024
StablePrompt: Automatic Prompt Tuning using Reinforcement Learning for
  Large Language Models
StablePrompt: Automatic Prompt Tuning using Reinforcement Learning for Large Language Models
Minchan Kwon
Gaeun Kim
Jongsuk Kim
Haeil Lee
Junmo Kim
OffRL
LRM
LLMAG
26
2
0
10 Oct 2024
PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches
PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches
Rana Muhammad Shahroz Khan
Pingzhi Li
Sukwon Yun
Zhenyu Wang
S. Nirjon
Chau-Wai Wong
Tianlong Chen
KELM
43
2
0
08 Oct 2024
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination
Eva Sánchez Salido
Roser Morante
Julio Gonzalo
Guillermo Marco
Jorge Carrillo-de-Albornoz
...
Enrique Amigó
Andrés Fernández
Alejandro Benito-Santos
Adrián Ghajari Espinosa
Victor Fresno
ELM
51
0
0
19 Sep 2024
LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning
LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning
Jin Jiang
Yuchen Yan
Yang Liu
Yonggang Jin
Shuai Peng
Hao Fei
Xunliang Cai
Yixin Cao
Liangcai Gao
Zhi Tang
LRM
52
3
0
19 Sep 2024
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation
Ilya Gusev
LLMAG
58
3
0
10 Sep 2024
Political DEBATE: Efficient Zero-shot and Few-shot Classifiers for
  Political Text
Political DEBATE: Efficient Zero-shot and Few-shot Classifiers for Political Text
Michael Burnham
Kayla Kahn
Ryan Yank Wang
Rachel X. Peng
34
5
0
03 Sep 2024
CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models
CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models
Jonathan Bourne
54
4
0
30 Aug 2024
The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design
The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design
Artem Snegirev
Maria Tikhonova
Anna Maksimova
Alena Fenogenova
Alexander Abramov
31
4
0
22 Aug 2024
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
Nitay Calderon
Roi Reichart
40
10
0
27 Jul 2024
Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners
Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners
Yifei Gao
Jie Ou
Lei Wang
Fanhua Shang
Jaji Wu
MQ
49
0
0
22 Jul 2024
MSEval: A Dataset for Material Selection in Conceptual Design to
  Evaluate Algorithmic Models
MSEval: A Dataset for Material Selection in Conceptual Design to Evaluate Algorithmic Models
Yash Jain
Daniele Grandi
Allin Groom
Brandon Cramer
Christopher McComb
44
0
0
12 Jul 2024
Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlay
Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlay
Gonçalo Hora de Carvalho
Oscar Knap
R. Pollice
ReLM
ELM
LRM
34
1
0
12 Jul 2024
Prompting Techniques for Secure Code Generation: A Systematic Investigation
Prompting Techniques for Secure Code Generation: A Systematic Investigation
Catherine Tony
Nicolás E. Díaz Ferreyra
Markus Mutas
Salem Dhiff
Riccardo Scandariato
SILM
79
9
0
09 Jul 2024
IL-TUR: Benchmark for Indian Legal Text Understanding and Reasoning
IL-TUR: Benchmark for Indian Legal Text Understanding and Reasoning
Abhinav Joshi
Shounak Paul
Akshat Sharma
Pawan Goyal
Saptarshi Ghosh
Ashutosh Modi
AILaw
ELM
34
7
0
07 Jul 2024
MAPO: Boosting Large Language Model Performance with Model-Adaptive
  Prompt Optimization
MAPO: Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization
Yuyan Chen
Zhihao Wen
Ge Fan
Zhengyu Chen
Wei Wu
Dayiheng Liu
Zhixu Li
Bang Liu
Yanghua Xiao
39
18
0
04 Jul 2024
LoPT: Low-Rank Prompt Tuning for Parameter Efficient Language Models
LoPT: Low-Rank Prompt Tuning for Parameter Efficient Language Models
Shouchang Guo
Sonam Damani
Keng-hao Chang
VLM
36
1
0
27 Jun 2024
PaCoST: Paired Confidence Significance Testing for Benchmark Contamination Detection in Large Language Models
PaCoST: Paired Confidence Significance Testing for Benchmark Contamination Detection in Large Language Models
Huixuan Zhang
Yun Lin
Xiaojun Wan
50
0
0
26 Jun 2024
Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning
Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning
Somnath Basu Roy Chowdhury
Krzysztof Choromanski
Arijit Sehanobish
Avinava Dubey
Snigdha Chaturvedi
MU
61
7
0
24 Jun 2024
Causal Discovery Inspired Unsupervised Domain Adaptation for Emotion-Cause Pair Extraction
Causal Discovery Inspired Unsupervised Domain Adaptation for Emotion-Cause Pair Extraction
Yuncheng Hua
Yujin Huang
Shuo Huang
Tao Feng
Lizhen Qu
Chris Bain
R. Bassed
Gholamreza Haffari
CML
OOD
53
2
0
18 Jun 2024
A Survey on Large Language Models from General Purpose to Medical
  Applications: Datasets, Methodologies, and Evaluations
A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations
Jinqiang Wang
Huansheng Ning
Yi Peng
Qikai Wei
Daniel Tesfai
Wenwei Mao
Tao Zhu
Runhe Huang
LM&MA
AI4MH
ELM
44
5
0
14 Jun 2024
Paraphrasing in Affirmative Terms Improves Negation Understanding
Paraphrasing in Affirmative Terms Improves Negation Understanding
MohammadHossein Rezaei
Eduardo Blanco
44
1
0
11 Jun 2024
Symmetric Dot-Product Attention for Efficient Training of BERT Language
  Models
Symmetric Dot-Product Attention for Efficient Training of BERT Language Models
Martin Courtois
Malte Ostendorff
Leonhard Hennig
Georg Rehm
39
2
0
10 Jun 2024
Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models
Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models
Kalyan Nakka
Jimmy Dani
Nitesh Saxena
48
1
0
08 Jun 2024
Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning
Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning
Naibin Gu
Peng Fu
Xiyu Liu
Bowen Shen
Zheng-Shen Lin
Weiping Wang
38
6
0
06 Jun 2024
HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial
  Actions across X Community Notes and Wikipedia edits
HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits
Tim Franzmeyer
Aleksandar Shtedritski
Samuel Albanie
Philip Torr
João F. Henriques
Jakob N. Foerster
32
1
0
05 Jun 2024
The Scandinavian Embedding Benchmarks: Comprehensive Assessment of
  Multilingual and Monolingual Text Embedding
The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding
Kenneth C. Enevoldsen
Márton Kardos
Niklas Muennighoff
Kristoffer Laigaard Nielbo
42
9
0
04 Jun 2024
A Survey Study on the State of the Art of Programming Exercise
  Generation using Large Language Models
A Survey Study on the State of the Art of Programming Exercise Generation using Large Language Models
Eduard Frankford
Ingo Höhn
Clemens Sauerwein
Ruth Breu
ELM
39
2
0
30 May 2024
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
Chankyu Lee
Rajarshi Roy
Mengyao Xu
Jonathan Raiman
M. Shoeybi
Bryan Catanzaro
Ming-Yu Liu
RALM
56
145
0
27 May 2024
Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations
Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations
Ziqiao Ma
Zekun Wang
Joyce Chai
58
2
0
22 May 2024
CPsyExam: A Chinese Benchmark for Evaluating Psychology using
  Examinations
CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations
Jiahao Zhao
Jingwei Zhu
Minghuan Tan
Min Yang
Di Yang
Chenhao Zhang
Guancheng Ye
Chengming Li
Xiping Hu
ELM
40
0
0
16 May 2024
Evaluation of Retrieval-Augmented Generation: A Survey
Evaluation of Retrieval-Augmented Generation: A Survey
Hao Yu
Aoran Gan
Kai Zhang
Shiwei Tong
Qi Liu
Zhaofeng Liu
3DV
62
82
0
13 May 2024
Towards a Search Engine for Machines: Unified Ranking for Multiple
  Retrieval-Augmented Large Language Models
Towards a Search Engine for Machines: Unified Ranking for Multiple Retrieval-Augmented Large Language Models
Alireza Salemi
Hamed Zamani
36
4
0
30 Apr 2024
Language Model Cascades: Token-level uncertainty and beyond
Language Model Cascades: Token-level uncertainty and beyond
Neha Gupta
Harikrishna Narasimhan
Wittawat Jitkrittum
A. S. Rawat
A. Menon
Sanjiv Kumar
UQLM
53
42
0
15 Apr 2024
MSciNLI: A Diverse Benchmark for Scientific Natural Language Inference
MSciNLI: A Diverse Benchmark for Scientific Natural Language Inference
Mobashir Sadat
Cornelia Caragea
40
4
0
11 Apr 2024
PRobELM: Plausibility Ranking Evaluation for Language Models
PRobELM: Plausibility Ranking Evaluation for Language Models
Moy Yuan
Chenxi Whitehouse
Eric Chamoun
Rami Aly
Andreas Vlachos
91
4
0
04 Apr 2024
A Controlled Reevaluation of Coreference Resolution Models
A Controlled Reevaluation of Coreference Resolution Models
Ian Porada
Xiyuan Zou
Jackie Chi Kit Cheung
35
1
0
31 Mar 2024
LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation
  Benchmark for Chinese Large Language Models
LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models
Chuang Liu
Renren Jin
Yuqi Ren
Deyi Xiong
ELM
43
0
0
19 Mar 2024
Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias
  in Factual Knowledge Extraction
Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction
Ziyang Xu
Keqin Peng
Liang Ding
Dacheng Tao
Xiliang Lu
34
10
0
15 Mar 2024
Advancing Generative AI for Portuguese with Open Decoder Gervásio PT*
Advancing Generative AI for Portuguese with Open Decoder Gervásio PT*
Rodrigo Santos
Joao Silva
Luís Gomes
João Rodrigues
António Branco
46
10
0
29 Feb 2024
Acquiring Linguistic Knowledge from Multimodal Input
Acquiring Linguistic Knowledge from Multimodal Input
Theodor Amariucai
Alexander Scott Warstadt
CLL
34
2
0
27 Feb 2024
Balanced Data Sampling for Language Model Training with Clustering
Balanced Data Sampling for Language Model Training with Clustering
Yunfan Shao
Linyang Li
Zhaoye Fei
Hang Yan
Dahua Lin
Xipeng Qiu
37
8
0
22 Feb 2024
Punctuation Restoration Improves Structure Understanding Without Supervision
Punctuation Restoration Improves Structure Understanding Without Supervision
Junghyun Min
Minho Lee
Woochul Lee
Yeonsoo Lee
59
1
0
13 Feb 2024
CPSDBench: A Large Language Model Evaluation Benchmark and Baseline for
  Chinese Public Security Domain
CPSDBench: A Large Language Model Evaluation Benchmark and Baseline for Chinese Public Security Domain
Xin Tong
Bo Jin
Zhi Lin
Binjun Wang
Ting Yu
Qiang Cheng
ELM
22
0
0
11 Feb 2024
Do LLMs Dream of Ontologies?
Do LLMs Dream of Ontologies?
Marco Bombieri
Paolo Fiorini
Simone Paolo Ponzetto
M. Rospocher
CLL
32
2
0
26 Jan 2024
Semantic Sensitivities and Inconsistent Predictions: Measuring the
  Fragility of NLI Models
Semantic Sensitivities and Inconsistent Predictions: Measuring the Fragility of NLI Models
Erik Arakelyan
Zhaoqi Liu
Isabelle Augenstein
AAML
45
9
0
25 Jan 2024
Setting the Record Straight on Transformer Oversmoothing
Setting the Record Straight on Transformer Oversmoothing
G. Dovonon
M. Bronstein
Matt J. Kusner
28
5
0
09 Jan 2024
The Butterfly Effect of Altering Prompts: How Small Changes and
  Jailbreaks Affect Large Language Model Performance
The Butterfly Effect of Altering Prompts: How Small Changes and Jailbreaks Affect Large Language Model Performance
A. Salinas
Fred Morstatter
45
49
0
08 Jan 2024
D3Former: Jointly Learning Repeatable Dense Detectors and
  Feature-enhanced Descriptors via Saliency-guided Transformer
D3Former: Jointly Learning Repeatable Dense Detectors and Feature-enhanced Descriptors via Saliency-guided Transformer
Junjie Gao
Pengfei Wang
Qiujie Dong
Qiong Zeng
Shiqing Xin
Caiming Zhang
19
0
0
20 Dec 2023
Previous
12345...8910
Next