Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1905.00537
Cited By
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
2 May 2019
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems"
50 / 497 papers shown
Title
Active Learning of Robot Vision Using Adaptive Path Planning
Julius Ruckin
Federico Magistri
Cyrill Stachniss
Marija Popović
SSL
26
0
0
14 Oct 2024
ELICIT: LLM Augmentation via External In-Context Capability
Futing Wang
Jianhao Yan
Yue Zhang
Tao Lin
44
0
0
12 Oct 2024
StablePrompt: Automatic Prompt Tuning using Reinforcement Learning for Large Language Models
Minchan Kwon
Gaeun Kim
Jongsuk Kim
Haeil Lee
Junmo Kim
OffRL
LRM
LLMAG
26
2
0
10 Oct 2024
PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches
Rana Muhammad Shahroz Khan
Pingzhi Li
Sukwon Yun
Zhenyu Wang
S. Nirjon
Chau-Wai Wong
Tianlong Chen
KELM
43
2
0
08 Oct 2024
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination
Eva Sánchez Salido
Roser Morante
Julio Gonzalo
Guillermo Marco
Jorge Carrillo-de-Albornoz
...
Enrique Amigó
Andrés Fernández
Alejandro Benito-Santos
Adrián Ghajari Espinosa
Victor Fresno
ELM
51
0
0
19 Sep 2024
LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning
Jin Jiang
Yuchen Yan
Yang Liu
Yonggang Jin
Shuai Peng
Hao Fei
Xunliang Cai
Yixin Cao
Liangcai Gao
Zhi Tang
LRM
52
3
0
19 Sep 2024
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation
Ilya Gusev
LLMAG
58
3
0
10 Sep 2024
Political DEBATE: Efficient Zero-shot and Few-shot Classifiers for Political Text
Michael Burnham
Kayla Kahn
Ryan Yank Wang
Rachel X. Peng
34
5
0
03 Sep 2024
CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models
Jonathan Bourne
54
4
0
30 Aug 2024
The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design
Artem Snegirev
Maria Tikhonova
Anna Maksimova
Alena Fenogenova
Alexander Abramov
31
4
0
22 Aug 2024
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
Nitay Calderon
Roi Reichart
40
10
0
27 Jul 2024
Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners
Yifei Gao
Jie Ou
Lei Wang
Fanhua Shang
Jaji Wu
MQ
49
0
0
22 Jul 2024
MSEval: A Dataset for Material Selection in Conceptual Design to Evaluate Algorithmic Models
Yash Jain
Daniele Grandi
Allin Groom
Brandon Cramer
Christopher McComb
44
0
0
12 Jul 2024
Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlay
Gonçalo Hora de Carvalho
Oscar Knap
R. Pollice
ReLM
ELM
LRM
34
1
0
12 Jul 2024
Prompting Techniques for Secure Code Generation: A Systematic Investigation
Catherine Tony
Nicolás E. Díaz Ferreyra
Markus Mutas
Salem Dhiff
Riccardo Scandariato
SILM
79
9
0
09 Jul 2024
IL-TUR: Benchmark for Indian Legal Text Understanding and Reasoning
Abhinav Joshi
Shounak Paul
Akshat Sharma
Pawan Goyal
Saptarshi Ghosh
Ashutosh Modi
AILaw
ELM
34
7
0
07 Jul 2024
MAPO: Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization
Yuyan Chen
Zhihao Wen
Ge Fan
Zhengyu Chen
Wei Wu
Dayiheng Liu
Zhixu Li
Bang Liu
Yanghua Xiao
39
18
0
04 Jul 2024
LoPT: Low-Rank Prompt Tuning for Parameter Efficient Language Models
Shouchang Guo
Sonam Damani
Keng-hao Chang
VLM
36
1
0
27 Jun 2024
PaCoST: Paired Confidence Significance Testing for Benchmark Contamination Detection in Large Language Models
Huixuan Zhang
Yun Lin
Xiaojun Wan
50
0
0
26 Jun 2024
Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning
Somnath Basu Roy Chowdhury
Krzysztof Choromanski
Arijit Sehanobish
Avinava Dubey
Snigdha Chaturvedi
MU
61
7
0
24 Jun 2024
Causal Discovery Inspired Unsupervised Domain Adaptation for Emotion-Cause Pair Extraction
Yuncheng Hua
Yujin Huang
Shuo Huang
Tao Feng
Lizhen Qu
Chris Bain
R. Bassed
Gholamreza Haffari
CML
OOD
53
2
0
18 Jun 2024
A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations
Jinqiang Wang
Huansheng Ning
Yi Peng
Qikai Wei
Daniel Tesfai
Wenwei Mao
Tao Zhu
Runhe Huang
LM&MA
AI4MH
ELM
44
5
0
14 Jun 2024
Paraphrasing in Affirmative Terms Improves Negation Understanding
MohammadHossein Rezaei
Eduardo Blanco
44
1
0
11 Jun 2024
Symmetric Dot-Product Attention for Efficient Training of BERT Language Models
Martin Courtois
Malte Ostendorff
Leonhard Hennig
Georg Rehm
39
2
0
10 Jun 2024
Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models
Kalyan Nakka
Jimmy Dani
Nitesh Saxena
48
1
0
08 Jun 2024
Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning
Naibin Gu
Peng Fu
Xiyu Liu
Bowen Shen
Zheng-Shen Lin
Weiping Wang
38
6
0
06 Jun 2024
HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits
Tim Franzmeyer
Aleksandar Shtedritski
Samuel Albanie
Philip Torr
João F. Henriques
Jakob N. Foerster
32
1
0
05 Jun 2024
The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding
Kenneth C. Enevoldsen
Márton Kardos
Niklas Muennighoff
Kristoffer Laigaard Nielbo
42
9
0
04 Jun 2024
A Survey Study on the State of the Art of Programming Exercise Generation using Large Language Models
Eduard Frankford
Ingo Höhn
Clemens Sauerwein
Ruth Breu
ELM
39
2
0
30 May 2024
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
Chankyu Lee
Rajarshi Roy
Mengyao Xu
Jonathan Raiman
M. Shoeybi
Bryan Catanzaro
Ming-Yu Liu
RALM
56
145
0
27 May 2024
Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations
Ziqiao Ma
Zekun Wang
Joyce Chai
58
2
0
22 May 2024
CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations
Jiahao Zhao
Jingwei Zhu
Minghuan Tan
Min Yang
Di Yang
Chenhao Zhang
Guancheng Ye
Chengming Li
Xiping Hu
ELM
40
0
0
16 May 2024
Evaluation of Retrieval-Augmented Generation: A Survey
Hao Yu
Aoran Gan
Kai Zhang
Shiwei Tong
Qi Liu
Zhaofeng Liu
3DV
62
82
0
13 May 2024
Towards a Search Engine for Machines: Unified Ranking for Multiple Retrieval-Augmented Large Language Models
Alireza Salemi
Hamed Zamani
36
4
0
30 Apr 2024
Language Model Cascades: Token-level uncertainty and beyond
Neha Gupta
Harikrishna Narasimhan
Wittawat Jitkrittum
A. S. Rawat
A. Menon
Sanjiv Kumar
UQLM
53
42
0
15 Apr 2024
MSciNLI: A Diverse Benchmark for Scientific Natural Language Inference
Mobashir Sadat
Cornelia Caragea
40
4
0
11 Apr 2024
PRobELM: Plausibility Ranking Evaluation for Language Models
Moy Yuan
Chenxi Whitehouse
Eric Chamoun
Rami Aly
Andreas Vlachos
91
4
0
04 Apr 2024
A Controlled Reevaluation of Coreference Resolution Models
Ian Porada
Xiyuan Zou
Jackie Chi Kit Cheung
35
1
0
31 Mar 2024
LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models
Chuang Liu
Renren Jin
Yuqi Ren
Deyi Xiong
ELM
43
0
0
19 Mar 2024
Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction
Ziyang Xu
Keqin Peng
Liang Ding
Dacheng Tao
Xiliang Lu
34
10
0
15 Mar 2024
Advancing Generative AI for Portuguese with Open Decoder Gervásio PT*
Rodrigo Santos
Joao Silva
Luís Gomes
João Rodrigues
António Branco
46
10
0
29 Feb 2024
Acquiring Linguistic Knowledge from Multimodal Input
Theodor Amariucai
Alexander Scott Warstadt
CLL
34
2
0
27 Feb 2024
Balanced Data Sampling for Language Model Training with Clustering
Yunfan Shao
Linyang Li
Zhaoye Fei
Hang Yan
Dahua Lin
Xipeng Qiu
37
8
0
22 Feb 2024
Punctuation Restoration Improves Structure Understanding Without Supervision
Junghyun Min
Minho Lee
Woochul Lee
Yeonsoo Lee
59
1
0
13 Feb 2024
CPSDBench: A Large Language Model Evaluation Benchmark and Baseline for Chinese Public Security Domain
Xin Tong
Bo Jin
Zhi Lin
Binjun Wang
Ting Yu
Qiang Cheng
ELM
22
0
0
11 Feb 2024
Do LLMs Dream of Ontologies?
Marco Bombieri
Paolo Fiorini
Simone Paolo Ponzetto
M. Rospocher
CLL
32
2
0
26 Jan 2024
Semantic Sensitivities and Inconsistent Predictions: Measuring the Fragility of NLI Models
Erik Arakelyan
Zhaoqi Liu
Isabelle Augenstein
AAML
45
9
0
25 Jan 2024
Setting the Record Straight on Transformer Oversmoothing
G. Dovonon
M. Bronstein
Matt J. Kusner
28
5
0
09 Jan 2024
The Butterfly Effect of Altering Prompts: How Small Changes and Jailbreaks Affect Large Language Model Performance
A. Salinas
Fred Morstatter
45
49
0
08 Jan 2024
D3Former: Jointly Learning Repeatable Dense Detectors and Feature-enhanced Descriptors via Saliency-guided Transformer
Junjie Gao
Pengfei Wang
Qiujie Dong
Qiong Zeng
Shiqing Xin
Caiming Zhang
19
0
0
20 Dec 2023
Previous
1
2
3
4
5
...
8
9
10
Next