ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.00537
  4. Cited By
SuperGLUE: A Stickier Benchmark for General-Purpose Language
  Understanding Systems
v1v2v3 (latest)

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

2 May 2019
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
    ELM
ArXiv (abs)PDFHTML

Papers citing "SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems"

50 / 1,500 papers shown
Title
Finance Language Model Evaluation (FLaME)
Finance Language Model Evaluation (FLaME)
Glenn Matlin
Mika Okamoto
Huzaifa Pardawala
Yang Yang
Sudheer Chava
AIFinLRM
23
0
0
18 Jun 2025
RE-IMAGINE: Symbolic Benchmark Synthesis for Reasoning Evaluation
RE-IMAGINE: Symbolic Benchmark Synthesis for Reasoning Evaluation
Xinnuo Xu
Rachel Lawrence
Kshitij Dubey
Atharva Pandey
Risa Ueno
Fabian Falck
A. Nori
Rahul Sharma
Amit Sharma
Javier González
LRM
14
0
0
18 Jun 2025
Understand the Implication: Learning to Think for Pragmatic Understanding
Understand the Implication: Learning to Think for Pragmatic Understanding
S. Sravanthi
Kishan Maharaj
Sravani Gunnu
Abhijit Mishra
Pushpak Bhattacharyya
ReLMLRM
19
0
0
16 Jun 2025
MEraser: An Effective Fingerprint Erasure Approach for Large Language Models
MEraser: An Effective Fingerprint Erasure Approach for Large Language Models
Jingxuan Zhang
Zhenhua Xu
Rui Hu
Wenpeng Xing
Xuhong Zhang
Meng Han
AAML
12
0
0
14 Jun 2025
OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems
OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems
Xiaozhe Li
Jixuan Chen
Xinyu Fang
Shengyuan Ding
Haodong Duan
Qingwen Liu
Kai-xiang Chen
LLMAGLRM
96
0
0
12 Jun 2025
Beyond Benchmarks: A Novel Framework for Domain-Specific LLM Evaluation and Knowledge Mapping
Beyond Benchmarks: A Novel Framework for Domain-Specific LLM Evaluation and Knowledge Mapping
Nitin Sharma
Thomas Wolfers
Çağatay Yıldız
ALM
10
0
0
09 Jun 2025
Exploring the Impact of Temperature on Large Language Models:Hot or Cold?
Exploring the Impact of Temperature on Large Language Models:Hot or Cold?
Lujun Li
Lama Sleem
Niccolo Gentile
Geoffrey Nichil
Radu State
10
0
0
08 Jun 2025
Leveraging Self-Attention for Input-Dependent Soft Prompting in LLMs
Leveraging Self-Attention for Input-Dependent Soft Prompting in LLMs
Ananth Muppidi
Abhilash Nandy
Sambaran Bandyopadhyay
14
0
0
05 Jun 2025
MMTU: A Massive Multi-Task Table Understanding and Reasoning Benchmark
MMTU: A Massive Multi-Task Table Understanding and Reasoning Benchmark
Junjie Xing
Yeye He
Mengyu Zhou
Haoyu Dong
Shi Han
Lingjiao Chen
Dongmei Zhang
S. Chaudhuri
H. V. Jagadish
LMTDELMLRM
32
0
0
05 Jun 2025
EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving
EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving
Shihan Dou
Ming Zhang
Chenhao Huang
Jiayi Chen
F. Chen
...
Wei Chengzhi
Lin Yan
Qi Zhang
Xuanjing Huang
Xuanjing Huang
ELM
70
0
0
03 Jun 2025
Adaptive Task Vectors for Large Language Models
Adaptive Task Vectors for Large Language Models
Joonseong Kang
Soojeong Lee
Subeen Park
Sumin Park
Taero Kim
Jihee Kim
Ryunyi Lee
Kyungwoo Song
27
0
0
03 Jun 2025
Can AI Master Econometrics? Evidence from Econometrics AI Agent on Expert-Level Tasks
Can AI Master Econometrics? Evidence from Econometrics AI Agent on Expert-Level Tasks
Qiang Chen
Tianyang Han
Jin Li
Ye Luo
Yuxiao Wu
Xiaowei Zhang
Tuo Zhou
42
0
0
01 Jun 2025
MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs
MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs
Gabrielle Kaili-May Liu
Gal Yona
Avi Caciularu
Idan Szpektor
Tim G. J. Rudner
Arman Cohan
26
0
0
30 May 2025
SUMO: Subspace-Aware Moment-Orthogonalization for Accelerating Memory-Efficient LLM Training
SUMO: Subspace-Aware Moment-Orthogonalization for Accelerating Memory-Efficient LLM Training
Yehonathan Refael
Guy Smorodinsky
Tom Tirer
Ofir Lindenbaum
27
0
0
30 May 2025
Decom-Renorm-Merge: Model Merging on the Right Space Improves Multitasking
Decom-Renorm-Merge: Model Merging on the Right Space Improves Multitasking
Yuatyong Chaichana
Thanapat Trachu
Peerat Limkonchotiwat
Konpat Preechakul
Tirasan Khandhawit
Ekapol Chuangsuwanich
MoMe
63
0
0
29 May 2025
Efficient Ensemble for Fine-tuning Language Models on Multiple Datasets
Efficient Ensemble for Fine-tuning Language Models on Multiple Datasets
Dongyue Li
Ziniu Zhang
Lu Wang
Hongyang R. Zhang
28
1
0
28 May 2025
THINK-Bench: Evaluating Thinking Efficiency and Chain-of-Thought Quality of Large Reasoning Models
THINK-Bench: Evaluating Thinking Efficiency and Chain-of-Thought Quality of Large Reasoning Models
Zhiyuan Li
Yi-Ju Chang
Yuan Wu
LLMAGLRM
74
0
0
28 May 2025
Budget-Adaptive Adapter Tuning in Orthogonal Subspaces for Continual Learning in LLMs
Budget-Adaptive Adapter Tuning in Orthogonal Subspaces for Continual Learning in LLMs
Zhiyi Wan
Wanrou Du
Liang Li
Miao Pan
Xiaoqi Qin
CLL
22
0
0
28 May 2025
Research Community Perspectives on "Intelligence" and Large Language Models
Research Community Perspectives on "Intelligence" and Large Language Models
Bertram Højer
Terne Sasha Thorn Jakobsen
Anna Rogers
Stefan Heinrich
36
0
0
27 May 2025
Information-Theoretic Complementary Prompts for Improved Continual Text Classification
Information-Theoretic Complementary Prompts for Improved Continual Text Classification
Duzhen Zhang
Yong Ren
Chenxing Li
Dong Yu
Tielin Zhang
CLLVLM
88
0
0
27 May 2025
ESLM: Risk-Averse Selective Language Modeling for Efficient Pretraining
ESLM: Risk-Averse Selective Language Modeling for Efficient Pretraining
Melis Ilayda Bal
Volkan Cevher
Michael Muehlebach
41
0
0
26 May 2025
Turing Test 2.0: The General Intelligence Threshold
Turing Test 2.0: The General Intelligence Threshold
Georgios Mappouras
ELM
37
0
0
26 May 2025
KerZOO: Kernel Function Informed Zeroth-Order Optimization for Accurate and Accelerated LLM Fine-Tuning
KerZOO: Kernel Function Informed Zeroth-Order Optimization for Accurate and Accelerated LLM Fine-Tuning
Zhendong Mi
Qitao Tan
Xiaodong Yu
Zining Zhu
Geng Yuan
Shaoyi Huang
206
0
0
24 May 2025
GIM: Improved Interpretability for Large Language Models
GIM: Improved Interpretability for Large Language Models
Joakim Edin
Róbert Csordás
Tuukka Ruotsalo
Zhengxuan Wu
Maria Maistro
Jing-ling Huang
Lars Maaløe
124
0
0
23 May 2025
Understanding Differential Transformer Unchains Pretrained Self-Attentions
Understanding Differential Transformer Unchains Pretrained Self-Attentions
Chaerin Kong
Jiho Jang
Nojun Kwak
80
0
0
22 May 2025
Procedural Environment Generation for Tool-Use Agents
Procedural Environment Generation for Tool-Use Agents
Michael Sullivan
Mareike Hartmann
Alexander Koller
SyDa
13
0
0
21 May 2025
A Federated Splitting Framework for LLMs: Security, Efficiency, and Adaptability
A Federated Splitting Framework for LLMs: Security, Efficiency, and Adaptability
Zishuai Zhang
Hainan Zhang
JiaYing Zheng
Ziwei Wang
Yongxin Tong
Jin Dong
Zhiming Zheng
FedML
75
0
0
21 May 2025
TurnaboutLLM: A Deductive Reasoning Benchmark from Detective Games
TurnaboutLLM: A Deductive Reasoning Benchmark from Detective Games
Yuan Yuan
Muyu He
Muhammad Adil Shahid
Jiani Huang
Ziyang Li
Li Zhang
LRM
49
0
0
21 May 2025
R3: Robust Rubric-Agnostic Reward Models
R3: Robust Rubric-Agnostic Reward Models
David Anugraha
Zilu Tang
Lester James V. Miranda
Hanyang Zhao
Mohammad Rifqi Farhansyah
Garry Kuwanto
Derry Wijaya
Genta Indra Winata
209
1
0
19 May 2025
Fine-tuning Quantized Neural Networks with Zeroth-order Optimization
Fine-tuning Quantized Neural Networks with Zeroth-order Optimization
Sifeng Shang
Jiayi Zhou
Chenyu Lin
Minxian Li
Kaiyang Zhou
MQ
60
0
0
19 May 2025
Introspective Growth: Automatically Advancing LLM Expertise in Technology Judgment
Introspective Growth: Automatically Advancing LLM Expertise in Technology Judgment
Siyang Wu
Honglin Bao
Nadav Kunievsky
James A. Evans
123
0
0
18 May 2025
Evaluations at Work: Measuring the Capabilities of GenAI in Use
Evaluations at Work: Measuring the Capabilities of GenAI in Use
Brandon Lepine
Gawesha Weerantunga
Juho Kim
Pamela Mishkin
Matthew Beane
64
0
0
15 May 2025
On the Evaluation of Engineering Artificial General Intelligence
On the Evaluation of Engineering Artificial General Intelligence
Sandeep Neema
Susmit Jha
Adam Nagel
Ethan Lew
Chandrasekar Sureshkumar
Aleksa Gordic
Chase Shimmin
Hieu Nguygen
Paul Eremenko
ELM
53
0
0
15 May 2025
Towards Contamination Resistant Benchmarks
Towards Contamination Resistant Benchmarks
Rahmatullah Musawi
Sheng Lu
145
0
0
13 May 2025
Measuring Hong Kong Massive Multi-Task Language Understanding
Measuring Hong Kong Massive Multi-Task Language Understanding
Chuxue Cao
Zhenghao Zhu
Junqi Zhu
Guoying Lu
Siyu Peng
Juntao Dai
Weijie Shi
Sirui Han
Yike Guo
ELM
444
0
0
04 May 2025
Token-free Models for Sarcasm Detection
Token-free Models for Sarcasm Detection
Sumit Mamtani
Maitreya Sonawane
Kanika Agarwal
Nishanth Sanjeev
90
0
0
02 May 2025
Generative AI in Education: Student Skills and Lecturer Roles
Generative AI in Education: Student Skills and Lecturer Roles
Stefanie Krause
Ashish Dalvi
Syed Khubaib Zaidi
450
0
0
28 Apr 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
Xuzhao Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Tianwei Zhang
ALMELM
249
7
0
26 Apr 2025
FLUKE: A Linguistically-Driven and Task-Agnostic Framework for Robustness Evaluation
FLUKE: A Linguistically-Driven and Task-Agnostic Framework for Robustness Evaluation
Yulia Otmakhova
Hung Thinh Truong
Rahmad Mahendra
Zenan Zhai
Rongxin Zhu
Daniel Beck
Jey Han Lau
ELM
154
0
0
24 Apr 2025
Auditing the Ethical Logic of Generative AI Models
Auditing the Ethical Logic of Generative AI Models
W. Russell Neuman
Chad Coleman
Ali Dasdan
Safinah Ali
Manan Shah
ELMLRM
118
1
0
24 Apr 2025
UrbanPlanBench: A Comprehensive Urban Planning Benchmark for Evaluating Large Language Models
UrbanPlanBench: A Comprehensive Urban Planning Benchmark for Evaluating Large Language Models
Yu Zheng
Longyi Liu
Yuming Lin
Jie Feng
Guozhen Zhang
Depeng Jin
Yong Li
ELM
130
1
0
23 Apr 2025
ViQA-COVID: COVID-19 Machine Reading Comprehension Dataset for Vietnamese
ViQA-COVID: COVID-19 Machine Reading Comprehension Dataset for Vietnamese
H. Phung
Ngoc C. Lê
Van-Chien Nguyen
Hang Thi Nguyen
Thuy Phuong Thi Nguyen
220
2
0
21 Apr 2025
DMind Benchmark: Toward a Holistic Assessment of LLM Capabilities across the Web3 Domain
DMind Benchmark: Toward a Holistic Assessment of LLM Capabilities across the Web3 Domain
Enhao Huang
Rainy Sun
Anya Reese
Alex Chen
Alex Chen
...
Gang Zhao
Garry Zhao
Frank Li
Hobert Wong
Lowes Yang
ALMELM
116
0
0
18 Apr 2025
ZeroSumEval: Scaling LLM Evaluation with Inter-Model Competition
ZeroSumEval: Scaling LLM Evaluation with Inter-Model Competition
Haidar Khan
H. A. Alyahya
Yazeed Alnumay
M Saiful Bari
B. Yener
ELMLRM
88
0
0
17 Apr 2025
CPG-EVAL: A Multi-Tiered Benchmark for Evaluating the Chinese Pedagogical Grammar Competence of Large Language Models
CPG-EVAL: A Multi-Tiered Benchmark for Evaluating the Chinese Pedagogical Grammar Competence of Large Language Models
Dong Wang
ELM
57
0
0
17 Apr 2025
Myanmar XNLI: Building a Dataset and Exploring Low-resource Approaches to Natural Language Inference with Myanmar
Myanmar XNLI: Building a Dataset and Exploring Low-resource Approaches to Natural Language Inference with Myanmar
Aung Kyaw Htet
Mark Dras
46
1
0
13 Apr 2025
Can the capability of Large Language Models be described by human ability? A Meta Study
Can the capability of Large Language Models be described by human ability? A Meta Study
Mingrui Zan
Yunquan Zhang
Boyang Zhang
Fangming Liu
Daning Cheng
ELMLM&MA
84
1
0
13 Apr 2025
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Alex Warstadt
Aaron Mueller
Leshem Choshen
E. Wilcox
Chengxu Zhuang
...
Rafael Mosquera
Bhargavi Paranjape
Adina Williams
Tal Linzen
Ryan Cotterell
198
121
0
10 Apr 2025
Encoder-Decoder Gemma: Improving the Quality-Efficiency Trade-Off via Adaptation
Encoder-Decoder Gemma: Improving the Quality-Efficiency Trade-Off via Adaptation
Biao Zhang
Fedor Moiseev
Joshua Ainslie
Paul Suganthan
Min Ma
Surya Bhupatiraju
Fede Lebron
Orhan Firat
Armand Joulin
Zhe Dong
AI4CE
44
0
0
08 Apr 2025
ORAL: Prompting Your Large-Scale LoRAs via Conditional Recurrent Diffusion
ORAL: Prompting Your Large-Scale LoRAs via Conditional Recurrent Diffusion
Rana Muhammad Shahroz Khan
Dongwen Tang
Pingzhi Li
Kai Wang
Tianlong Chen
AI4CE
522
1
0
31 Mar 2025
1234...282930
Next