Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.03300
Cited By
v1
v2
v3 (latest)
Measuring Massive Multitask Language Understanding
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 3,408 papers shown
Title
Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning
Deepanway Ghosal
Yew Ken Chia
Navonil Majumder
Soujanya Poria
ALM
LRM
57
19
0
05 Jul 2023
Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners
Allen Z. Ren
Anushri Dixit
Alexandra Bodrova
Sumeet Singh
Stephen Tu
...
Jacob Varley
Zhenjia Xu
Dorsa Sadigh
Andy Zeng
Anirudha Majumdar
LM&Ro
286
239
0
04 Jul 2023
CARE-MI: Chinese Benchmark for Misinformation Evaluation in Maternity and Infant Care
Tong Xiang
Liangzhi Li
Wangyue Li
Min‐Jun Bai
Lu Wei
Bowen Wang
Noa Garcia
79
5
0
04 Jul 2023
SCITUNE: Aligning Large Language Models with Scientific Multimodal Instructions
Sameera Horawalavithana
Sai Munikoti
Ian Stewart
Henry Kvinge
MLLM
93
19
0
03 Jul 2023
Personality Traits in Large Language Models
Gregory Serapio-García
Mustafa Safdari
Clément Crepy
Luning Sun
Stephen Fitz
P. Romero
Marwa Abdulhai
Aleksandra Faust
Maja J. Matarić
LM&MA
LLMAG
207
127
0
01 Jul 2023
CMATH: Can Your Language Model Pass Chinese Elementary School Math Test?
Tianwen Wei
Jian Luan
Wen Liu
Shuang Dong
Bin Wang
ELM
79
36
0
29 Jun 2023
On the Exploitability of Instruction Tuning
Manli Shu
Jiong Wang
Chen Zhu
Jonas Geiping
Chaowei Xiao
Tom Goldstein
SILM
144
99
0
28 Jun 2023
SCENEREPLICA: Benchmarking Real-World Robot Manipulation by Creating Replicable Scenes
Ninad Khargonkar
Sai Haneesh Allu
Ya Lu
Jishnu Jaykumar
Balakrishnan Prabhakaran
Yu Xiang
38
2
0
27 Jun 2023
Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning
Xiao Ma
Swaroop Mishra
Ahmad Beirami
Alex Beutel
Jilin Chen
ELM
ReLM
LRM
59
12
0
25 Jun 2023
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models
Neel Jain
Khalid Saifullah
Yuxin Wen
John Kirchenbauer
Manli Shu
Aniruddha Saha
Micah Goldblum
Jonas Geiping
Tom Goldstein
ALM
ELM
102
23
0
23 Jun 2023
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs
Miao Xiong
Zhiyuan Hu
Xinyang Lu
Yifei Li
Jie Fu
Junxian He
Bryan Hooi
230
451
0
22 Jun 2023
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models
Shizhe Diao
Boyao Wang
Hanze Dong
Kashun Shum
Jipeng Zhang
Wei Xiong
Tong Zhang
ALM
103
66
0
21 Jun 2023
A Simple and Effective Pruning Approach for Large Language Models
Mingjie Sun
Zhuang Liu
Anna Bair
J. Zico Kolter
173
441
0
20 Jun 2023
Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect using the Only Connect Wall Dataset
S. Naeini
Raeid Saqur
M. Saeidi
John Giorgi
Babak Taati
121
11
0
19 Jun 2023
Toward the Cure of Privacy Policy Reading Phobia: Automated Generation of Privacy Nutrition Labels From Privacy Policies
Shidong Pan
Thong Hoang
Dawen Zhang
Zhenchang Xing
Xiwei Xu
Qinghua Lu
Mark Staples
93
16
0
19 Jun 2023
Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses
Jaromír Šavelka
Arav Agarwal
Marshall An
Chris Bogart
M. Sakr
ELM
106
115
0
15 Jun 2023
Inverse Scaling: When Bigger Isn't Better
I. R. McKenzie
Alexander Lyzhov
Michael Pieler
Alicia Parrish
Aaron Mueller
...
Yuhui Zhang
Zhengping Zhou
Najoung Kim
Sam Bowman
Ethan Perez
97
140
0
15 Jun 2023
KoLA: Carefully Benchmarking World Knowledge of Large Language Models
Jifan Yu
Xiaozhi Wang
Shangqing Tu
S. Cao
Daniel Zhang-Li
...
Lei Hou
Zhiyuan Liu
Bin Xu
Jie Tang
Juanzi Li
ELM
ALM
114
69
0
15 Jun 2023
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models
Peng Xu
Wenqi Shao
Kaipeng Zhang
Peng Gao
Shuo Liu
Meng Lei
Fanqing Meng
Siyuan Huang
Yu Qiao
Ping Luo
ELM
MLLM
108
174
0
15 Jun 2023
CMMLU: Measuring massive multitask language understanding in Chinese
Haonan Li
Yixuan Zhang
Fajri Koto
Yifei Yang
Hai Zhao
Yeyun Gong
Nan Duan
Tim Baldwin
ALM
ELM
118
273
0
15 Jun 2023
Domain-specific ChatBots for Science using Embeddings
Kevin G. Yager
70
8
0
15 Jun 2023
Revealing the structure of language model capabilities
Ryan Burnell
Hank Hao
Andrew R. A. Conway
José Hernández-Orallo
ELM
82
19
0
14 Jun 2023
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
Arnav Chavan
Zhuang Liu
D. K. Gupta
Eric P. Xing
Zhiqiang Shen
102
92
0
13 Jun 2023
Questioning the Survey Responses of Large Language Models
Ricardo Dominguez-Olmedo
Moritz Hardt
Celestine Mendler-Dünner
133
36
0
13 Jun 2023
Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models
Yin Fang
Xiaozhuan Liang
Ningyu Zhang
Kangwei Liu
Rui Huang
Zhuo Chen
Xiaohui Fan
Huajun Chen
126
87
0
13 Jun 2023
SqueezeLLM: Dense-and-Sparse Quantization
Sehoon Kim
Coleman Hooper
A. Gholami
Zhen Dong
Xiuyu Li
Sheng Shen
Michael W. Mahoney
Kurt Keutzer
MQ
152
198
0
13 Jun 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
587
4,455
0
09 Jun 2023
M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models
Wenxuan Zhang
Sharifah Mahani Aljunied
Chang Gao
Yew Ken Chia
Lidong Bing
ELM
131
87
0
08 Jun 2023
INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models
Yew Ken Chia
Pengfei Hong
Lidong Bing
Soujanya Poria
ELM
79
65
0
07 Jun 2023
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
Yizhong Wang
Hamish Ivison
Pradeep Dasigi
Jack Hessel
Tushar Khot
...
David Wadden
Kelsey MacMillan
Noah A. Smith
Iz Beltagy
Hannaneh Hajishirzi
ALM
ELM
120
393
0
07 Jun 2023
PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
Kaijie Zhu
Jindong Wang
Jiaheng Zhou
Zichen Wang
Hao Chen
...
Linyi Yang
Weirong Ye
Yue Zhang
Neil Zhenqiang Gong
Xingxu Xie
SILM
138
144
0
07 Jun 2023
Benchmarking Foundation Models with Language-Model-as-an-Examiner
Yushi Bai
Jiahao Ying
Yixin Cao
Xin Lv
Yuze He
...
Yijia Xiao
Haozhe Lyu
Jiayin Zhang
Juanzi Li
Lei Hou
ALM
ELM
107
149
0
07 Jun 2023
The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter
Ajay Jaiswal
Shiwei Liu
Tianlong Chen
Zhangyang Wang
VLM
71
34
0
06 Jun 2023
Applying Standards to Advance Upstream & Downstream Ethics in Large Language Models
Jose Berengueres
Marybeth Sandell
68
0
0
06 Jun 2023
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Kenneth Li
Oam Patel
Fernanda Viégas
Hanspeter Pfister
Martin Wattenberg
KELM
HILM
160
584
0
06 Jun 2023
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset
Junling Liu
Peilin Zhou
Yining Hua
Dading Chong
Zhongyu Tian
...
Helin Wang
Chenyu You
Zhenhua Guo
Lei Zhu
Michael Lingzhi Li
LM&MA
ELM
111
79
0
05 Jun 2023
MultiLegalPile: A 689GB Multilingual Legal Corpus
Joel Niklaus
Veton Matoshi
Matthias Sturmer
Ilias Chalkidis
Daniel E. Ho
AILaw
ELM
120
44
0
03 Jun 2023
Reimagining Retrieval Augmented Language Models for Answering Queries
W. Tan
Yuliang Li
Pedro Rodriguez
Rich James
Xi Lin
A. Halevy
Scott Yih
KELM
LRM
98
9
0
01 Jun 2023
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets
Md Tahmid Rahman Laskar
M Saiful Bari
Mizanur Rahman
Md Amran Hossen Bhuiyan
Shafiq Joty
J. Huang
LM&MA
ELM
ALM
125
193
0
29 May 2023
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
Zechun Liu
Barlas Oğuz
Changsheng Zhao
Ernie Chang
Pierre Stock
Yashar Mehdad
Yangyang Shi
Raghuraman Krishnamoorthi
Vikas Chandra
MQ
128
208
0
29 May 2023
Conformal Prediction with Large Language Models for Multi-Choice Question Answering
Bhawesh Kumar
Cha-Chen Lu
Gauri Gupta
Anil Palepu
David R. Bellamy
Ramesh Raskar
Andrew L. Beam
99
77
0
28 May 2023
What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks
Taicheng Guo
Kehan Guo
B. Nan
Zhengwen Liang
Zhichun Guo
Nitesh Chawla
Olaf Wiest
Xiangliang Zhang
ELM
175
142
0
27 May 2023
Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-In
Zichun Yu
Chenyan Xiong
S. Yu
Zhiyuan Liu
KELM
VLM
102
69
0
27 May 2023
Beyond Positive Scaling: How Negation Impacts Scaling Trends of Language Models
Yuhui Zhang
Michihiro Yasunaga
Zhengping Zhou
Jeff Z. HaoChen
James Zou
Percy Liang
Serena Yeung
95
9
0
27 May 2023
Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance
Yao Fu
Litu Ou
Mingyu Chen
Yuhao Wan
Hao-Chun Peng
Tushar Khot
LLMAG
ELM
LRM
ReLM
80
115
0
26 May 2023
Training Socially Aligned Language Models on Simulated Social Interactions
Ruibo Liu
Ruixin Yang
Chenyan Jia
Ge Zhang
Denny Zhou
Andrew M. Dai
Diyi Yang
Soroush Vosoughi
ALM
78
56
0
26 May 2023
Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation
Niels Mündler
Jingxuan He
Slobodan Jenko
Martin Vechev
HILM
70
119
0
25 May 2023
The False Promise of Imitating Proprietary LLMs
Arnav Gudibande
Eric Wallace
Charles Burton Snell
Xinyang Geng
Hao Liu
Pieter Abbeel
Sergey Levine
Dawn Song
ALM
126
208
0
25 May 2023
On Degrees of Freedom in Defining and Testing Natural Language Understanding
Saku Sugawara
S. Tsugita
ELM
83
1
0
24 May 2023
C-STS: Conditional Semantic Textual Similarity
Ameet Deshpande
Carlos E. Jimenez
Howard Chen
Vishvak Murahari
Victoria Graf
Tanmay Rajpurohit
Ashwin Kalyan
Danqi Chen
Karthik Narasimhan
61
3
0
24 May 2023
Previous
1
2
3
...
64
65
66
67
68
69
Next