Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1905.00537
Cited By
v1
v2
v3 (latest)
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
2 May 2019
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems"
50 / 1,500 papers shown
Title
Redefining Evaluation Standards: A Unified Framework for Evaluating the Korean Capabilities of Language Models
Hanwool Albert Lee
Soo Yong Kim
Dasol Choi
Sangwon Baek
Seunghyeok Hong
Ilgyun Jeong
Inseon Hwang
Naeun Lee
Guijin Son
VLM
106
0
0
29 Mar 2025
MoQa: Rethinking MoE Quantization with Multi-stage Data-model Distribution Awareness
Zihao Zheng
Xiuping Cui
Size Zheng
Maoliang Li
Jiayu Chen
Yun Liang
Xiang Chen
MQ
MoE
125
0
0
27 Mar 2025
Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark
Sondos Mahmoud Bsharat
Mukul Ranjan
Aidar Myrzakhan
Jiacheng Liu
Bowei Guo
Shengkun Tang
Zhuang Liu
Yuanzhi Li
Zhiqiang Shen
ELM
118
1
0
26 Mar 2025
Cyborg Data: Merging Human with AI Generated Training Data
Kai North
Christopher Ormerod
68
0
0
26 Mar 2025
CASE -- Condition-Aware Sentence Embeddings for Conditional Semantic Textual Similarity Measurement
Gaifan Zhang
Yi Zhou
Danushka Bollegala
522
0
0
21 Mar 2025
Measuring AI Ability to Complete Long Tasks
Thomas Kwa
Ben West
Joel Becker
Amy Deng
Katharyn Garcia
...
Lucas Jun Koba Sato
H. Wijk
Daniel M. Ziegler
Elizabeth Barnes
Lawrence Chan
ELM
279
18
0
18 Mar 2025
A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules
Kairong Luo
Haodong Wen
Shengding Hu
Zhenbo Sun
Zhiyuan Liu
Maosong Sun
Kaifeng Lyu
Wenguang Chen
CLL
115
3
0
17 Mar 2025
TLUE: A Tibetan Language Understanding Evaluation Benchmark
Fan Gao
Cheng Huang
Nyima Tashi
Xiangxiang Wang
Thupten Tsering
...
Gadeng Luosang
Rinchen Dongrub
Dorje Tashi
Xiao Feng
Yongbin Yu
ELM
250
2
0
15 Mar 2025
LAG-MMLU: Benchmarking Frontier LLM Understanding in Latvian and Giriama
Naome A. Etori
Kevin Lu
Randu Karisa
Arturs Kanepajs
LRM
ELM
479
0
0
14 Mar 2025
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation
Weihao Xuan
Rui Yang
Heli Qi
Qingcheng Zeng
Yunze Xiao
...
Edison Marrese-Taylor
Shijian Lu
Yusuke Iwasawa
Yutaka Matsuo
Irene Li
ELM
213
7
0
13 Mar 2025
AI-driven control of bioelectric signalling for real-time topological reorganization of cells
Gonçalo Hora de Carvalho
AI4CE
116
0
0
10 Mar 2025
Assessing the Macro and Micro Effects of Random Seeds on Fine-Tuning Large Language Models
Hao Zhou
Guergana Savova
Lijing Wang
122
0
0
10 Mar 2025
Biases in Large Language Model-Elicited Text: A Case Study in Natural Language Inference
Grace Proebsting
Adam Poliak
98
0
0
06 Mar 2025
Tgea: An error-annotated dataset and benchmark tasks for text generation from pretrained language models
Jie He
Bo Peng
Yi-Lun Liao
Qun Liu
Deyi Xiong
109
8
0
06 Mar 2025
Zero-Shot Complex Question-Answering on Long Scientific Documents
Wanting Wang
RALM
80
0
0
04 Mar 2025
SampleMix: A Sample-wise Pre-training Data Mixing Strategey by Coordinating Data Quality and Diversity
Xiangyu Xi
Deyang Kong
Jian Yang
Jiawei Yang
Zheyu Chen
Wei Wang
Jinqiao Wang
Xunliang Cai
Shikun Zhang
Wei Ye
109
0
0
03 Mar 2025
AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses
Nicholas Carlini
Javier Rando
Edoardo Debenedetti
Milad Nasr
F. Tramèr
AAML
ELM
92
3
0
03 Mar 2025
LORENZA: Enhancing Generalization in Low-Rank Gradient LLM Training via Efficient Zeroth-Order Adaptive SAM
Yehonathan Refael
Iftach Arbel
Ofir Lindenbaum
Tom Tirer
169
1
0
26 Feb 2025
BIG-Bench Extra Hard
Mehran Kazemi
Bahare Fatemi
Hritik Bansal
John Palowitch
Chrysovalantis Anastasiou
...
Kate Olszewska
Yi Tay
Vinh Q. Tran
Quoc V. Le
Orhan Firat
ELM
LRM
299
13
0
26 Feb 2025
Self-Adjust Softmax
Chuanyang Zheng
Yihang Gao
Guoxuan Chen
Han Shi
Jing Xiong
Xiaozhe Ren
Chao Huang
Xin Jiang
Zhiyu Li
Yu Li
81
1
0
25 Feb 2025
BERTtime Stories: Investigating the Role of Synthetic Story Data in Language Pre-training
Nikitas Theodoropoulos
Giorgos Filandrianos
Vassilis Lyberatos
Maria Lymperaiou
Giorgos Stamou
SyDa
219
1
0
24 Feb 2025
Unveiling Reasoning Thresholds in Language Models: Scaling, Fine-Tuning, and Interpretability through Attention Maps
Yen-Che Hsiao
Abhishek Dutta
LRM
ReLM
ELM
116
0
0
24 Feb 2025
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Rylan Schaeffer
Punit Singh Koura
Binh Tang
R. Subramanian
Aaditya K. Singh
...
Vedanuj Goswami
Sergey Edunov
Dieuwke Hupkes
Sanmi Koyejo
Sharan Narang
ALM
144
1
0
24 Feb 2025
PiCO: Peer Review in LLMs based on the Consistency Optimization
Kun-Peng Ning
Shuo Yang
Yu-Yang Liu
Jia-Yu Yao
Zhen-Hui Liu
Yu Wang
Ming Pang
Li Yuan
ALM
210
9
0
24 Feb 2025
Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation
Simin Chen
Yiming Chen
Zexin Li
Yifan Jiang
Zhongwei Wan
...
Dezhi Ran
Tianle Gu
Haoyang Li
Tao Xie
Baishakhi Ray
95
6
0
23 Feb 2025
Recurrent Knowledge Identification and Fusion for Language Model Continual Learning
Yujie Feng
Xujia Wang
Zexin Lu
Shenghong Fu
Guangyuan Shi
Yongxin Xu
Yasha Wang
Philip S. Yu
Xu Chu
Xiao-Ming Wu
CLL
KELM
115
1
0
22 Feb 2025
Understand User Opinions of Large Language Models via LLM-Powered In-the-Moment User Experience Interviews
Mengqiao Liu
Tevin Wang
Cassandra A. Cohen
Sarah Li
Chenyan Xiong
LRM
118
0
0
21 Feb 2025
LSR-Adapt: Ultra-Efficient Parameter Tuning with Matrix Low Separation Rank Kernel Adaptation
Xin Li
Anand D. Sarwate
94
1
0
20 Feb 2025
Rare Disease Differential Diagnosis with Large Language Models at Scale: From Abdominal Actinomycosis to Wilson's Disease
Elliot Schumacher
Dhruv Naik
Anitha Kannan
LM&MA
66
0
0
20 Feb 2025
Reasoning and the Trusting Behavior of DeepSeek and GPT: An Experiment Revealing Hidden Fault Lines in Large Language Models
Rubing Li
João Sedoc
Arun Sundararajan
LRM
106
1
0
20 Feb 2025
MaZO: Masked Zeroth-Order Optimization for Multi-Task Fine-Tuning of Large Language Models
Zhen Zhang
Yue Yang
Kai Zhen
Nathan Susanj
Athanasios Mouchtaris
Siegfried Kunzmann
Zheng Zhang
103
1
0
17 Feb 2025
What Are They Filtering Out? A Survey of Filtering Strategies for Harm Reduction in Pretraining Datasets
Marco Antonio Stranisci
Christian Hardmeier
165
1
0
17 Feb 2025
QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models
Jiajun Zhou
Yifan Yang
Kai Zhen
Ziyue Liu
Yequan Zhao
Ershad Banijamali
Athanasios Mouchtaris
Ngai Wong
Zheng Zhang
MQ
67
0
0
17 Feb 2025
Exposing Numeracy Gaps: A Benchmark to Evaluate Fundamental Numerical Abilities in Large Language Models
Haoyang Li
Xuejia Chen
Zhanchao Xu
Darian Li
Nicole Hu
...
Yongbin Li
Luyu Qiu
C. Zhang
Qing Li
Lei Chen
ELM
LRM
114
1
0
16 Feb 2025
TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages
Jafar Isbarov
Arofat Akhundjanova
Mammad Hajili
Kavsar Huseynova
Dmitry Gaynullin
...
Amina Alisheva
Aizirek Turdubaeva
Abdullatif Köksal
Samir Rustamov
Duygu Ataman
ELM
76
0
0
16 Feb 2025
DebateBench: A Challenging Long Context Reasoning Benchmark For Large Language Models
Utkarsh Tiwari
Aryan Seth
Adi Mukherjee
Kaavya Mer
Kavish
Dhruv Kumar
ELM
LRM
109
0
0
10 Feb 2025
PiKE: Adaptive Data Mixing for Large-Scale Multi-Task Learning Under Low Gradient Conflicts
Zeman Li
Yuan Deng
Peilin Zhong
Meisam Razaviyayn
Vahab Mirrokni
MoMe
132
1
0
10 Feb 2025
Unbiased Evaluation of Large Language Models from a Causal Perspective
Meilin Chen
Jian Tian
Liang Ma
Di Xie
Weijie Chen
Jiang Zhu
ALM
ELM
159
0
0
10 Feb 2025
RideKE: Leveraging Low-Resource, User-Generated Twitter Content for Sentiment and Emotion Detection in Kenyan Code-Switched Dataset
Naome A. Etori
Maria Gini
166
3
0
10 Feb 2025
Towards Sustainable NLP: Insights from Benchmarking Inference Energy in Large Language Models
S. Poddar
Paramita Koley
Janardan Misra
Niloy Ganguly
Saptarshi Ghosh
Saptarshi Ghosh
145
0
0
08 Feb 2025
M-IFEval: Multilingual Instruction-Following Evaluation
Antoine Dussolle
Andrea Cardeña Díaz
Shota Sato
Peter Devine
ELM
166
0
0
07 Feb 2025
MultiQ&A: An Analysis in Measuring Robustness via Automated Crowdsourcing of Question Perturbations and Answers
Nicole Cho
William Watson
AAML
HILM
280
0
0
06 Feb 2025
Bilevel ZOFO: Bridging Parameter-Efficient and Zeroth-Order Techniques for Efficient LLM Fine-Tuning and Meta-Training
Reza Shirkavand
Qi He
Peiran Yu
Heng-Chiao Huang
ALM
99
0
0
05 Feb 2025
Are Language Models Up to Sequential Optimization Problems? From Evaluation to a Hegelian-Inspired Enhancement
Soheil Abbasloo
LRM
70
0
0
04 Feb 2025
PARA: Parameter-Efficient Fine-tuning with Prompt Aware Representation Adjustment
Zequan Liu
Yi Zhao
Ming Tan
Wei Zhu
Aaron Xuxiang Tian
157
0
0
03 Feb 2025
Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers
Akiyoshi Tomihari
Issei Sato
ODL
153
3
0
31 Jan 2025
Brain network science modelling of sparse neural networks enables Transformers and LLMs to perform as fully connected
Yingtao Zhang
Diego Cerretti
Jialin Zhao
Wenjing Wu
Ziheng Liao
Umberto Michieli
C. Cannistraci
155
1
0
31 Jan 2025
Survey and Improvement Strategies for Gene Prioritization with Large Language Models
Matthew Neeley
Guantong Qi
Guanchu Wang
Ruixiang Tang
Dongxue Mao
...
Bo Yuan
Fan Xia
Pengfei Liu
Zhandong Liu
Helen Zhou
LM&MA
150
2
0
30 Jan 2025
A linguistically-motivated evaluation methodology for unraveling model's abilities in reading comprehension tasks
Elie Antoine
Frédéric Béchet
Géraldine Damnati
Philippe Langlais
164
1
0
29 Jan 2025
BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models
Yibin Wang
Haizhou Shi
Ligong Han
Dimitris N. Metaxas
Hao Wang
BDL
UQLM
226
13
0
28 Jan 2025
Previous
1
2
3
4
5
...
28
29
30
Next