Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.03300
Cited By
v1
v2
v3 (latest)
Measuring Massive Multitask Language Understanding
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 3,408 papers shown
Title
Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs
Deepak Narayanan
Keshav Santhanam
Peter Henderson
Rishi Bommasani
Tony Lee
Percy Liang
192
3
0
03 May 2023
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Jingfeng Yang
Hongye Jin
Ruixiang Tang
Xiaotian Han
Qizhang Feng
Haoming Jiang
Bing Yin
Helen Zhou
LM&MA
214
682
0
26 Apr 2023
Measuring Massive Multitask Chinese Understanding
Hui Zeng
ALM
ELM
AILaw
102
29
0
25 Apr 2023
Why Does ChatGPT Fall Short in Providing Truthful Answers?
Shen Zheng
Jie Huang
Kevin Chen-Chuan Chang
HILM
AI4MH
115
56
0
20 Apr 2023
LongForm: Effective Instruction Tuning with Reverse Instructions
Abdullatif Köksal
Timo Schick
Anna Korhonen
Hinrich Schütze
SyDa
ALM
162
40
0
17 Apr 2023
From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning
Qian Liu
Fan Zhou
Zhengbao Jiang
Longxu Dou
Min Lin
91
17
0
17 Apr 2023
nanoLM: an Affordable LLM Pre-training Benchmark via Accurate Loss Prediction across Scales
Yiqun Yao
Siqi Fan
Xiusheng Huang
Xuezhi Fang
Xiang Li
...
Peng Han
Shuo Shang
Kang Liu
Aixin Sun
Yequan Wang
78
6
0
14 Apr 2023
Learning Personalized Decision Support Policies
Umang Bhatt
Valerie Chen
Katherine M. Collins
Parameswaran Kamalaruban
Emma Kallina
Adrian Weller
Ameet Talwalkar
OffRL
222
11
0
13 Apr 2023
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
Wanjun Zhong
Ruixiang Cui
Yiduo Guo
Yaobo Liang
Shuai Lu
Yanlin Wang
Amin Saied
Weizhu Chen
Nan Duan
ALM
ELM
135
550
0
13 Apr 2023
Can Large Language Models Transform Computational Social Science?
Caleb Ziems
William B. Held
Omar Shaikh
Jiaao Chen
Zhehao Zhang
Diyi Yang
LLMAG
128
322
0
12 Apr 2023
Boosted Prompt Ensembles for Large Language Models
Silviu Pitis
Michael Ruogu Zhang
Andrew Wang
Jimmy Ba
LRM
LLMAG
70
43
0
12 Apr 2023
LLMMaps -- A Visual Metaphor for Stratified Evaluation of Large Language Models
Patrik Puchert
Poonam Poonam
Christian van Onzenoodt
Timo Ropinski
64
9
0
02 Apr 2023
BloombergGPT: A Large Language Model for Finance
Shijie Wu
Ozan Irsoy
Steven Lu
Vadim Dabravolski
Mark Dredze
Sebastian Gehrmann
P. Kambadur
David S. Rosenberg
Gideon Mann
AIFin
246
853
0
30 Mar 2023
Whose Opinions Do Language Models Reflect?
Shibani Santurkar
Esin Durmus
Faisal Ladhak
Cinoo Lee
Percy Liang
Tatsunori Hashimoto
115
448
0
30 Mar 2023
Natural Language Reasoning, A Survey
Fei Yu
Hongbo Zhang
Prayag Tiwari
Benyou Wang
ReLM
LRM
171
63
0
26 Mar 2023
k
k
k
NN Prompting: Beyond-Context Learning with Calibration-Free Nearest Neighbor Inference
Benfeng Xu
Quan Wang
Zhendong Mao
Yajuan Lyu
Qiaoqiao She
Yongdong Zhang
140
53
0
24 Mar 2023
Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency
Vithursan Thangarasa
Shreyas Saxena
Abhay Gupta
Sean Lie
110
5
0
21 Mar 2023
Language Model Behavior: A Comprehensive Survey
Tyler A. Chang
Benjamin Bergen
VLM
LRM
LM&MA
111
109
0
20 Mar 2023
eP-ALM: Efficient Perceptual Augmentation of Language Models
Mustafa Shukor
Corentin Dancette
Matthieu Cord
MLLM
VLM
72
31
0
20 Mar 2023
Capabilities of GPT-4 on Medical Challenge Problems
Harsha Nori
Nicholas King
S. McKinney
Dean Carignan
Eric Horvitz
LM&MA
ELM
AI4MH
156
812
0
20 Mar 2023
Large Language Model Instruction Following: A Survey of Progresses and Challenges
Renze Lou
Kai Zhang
Wenpeng Yin
ALM
LRM
159
25
0
18 Mar 2023
Can Generative Pre-trained Transformers (GPT) Pass Assessments in Higher Education Programming Courses?
Jaromír Šavelka
Arav Agarwal
Chris Bogart
Yifan Song
M. Sakr
ELM
59
99
0
16 Mar 2023
ART: Automatic multi-step reasoning and tool-use for large language models
Bhargavi Paranjape
Scott M. Lundberg
Sameer Singh
Hannaneh Hajishirzi
Luke Zettlemoyer
Marco Tulio Ribeiro
KELM
ReLM
LRM
99
154
0
16 Mar 2023
The Learnability of In-Context Learning
Noam Wies
Yoav Levine
Amnon Shashua
186
109
0
14 Mar 2023
Generating multiple-choice questions for medical question answering with distractors and cue-masking
Damien Sileo
Kanimozhi Uma
Marie-Francine Moens
70
5
0
13 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
1.6K
13,533
0
27 Feb 2023
Testing AI on language comprehension tasks reveals insensitivity to underlying meaning
Vittoria Dentella
Fritz Guenther
Elliot Murphy
G. Marcus
Evelina Leivada
ELM
115
31
0
23 Feb 2023
Complex QA and language models hybrid architectures, Survey
Xavier Daull
P. Bellot
Emmanuel Bruno
Vincent Martin
Elisabeth Murisasco
ELM
212
16
0
17 Feb 2023
Augmented Language Models: a Survey
Grégoire Mialon
Roberto Dessì
Maria Lomeli
Christoforos Nalmpantis
Ramakanth Pasunuru
...
Jane Dwivedi-Yu
Asli Celikyilmaz
Edouard Grave
Yann LeCun
Thomas Scialom
LRM
KELM
99
391
0
15 Feb 2023
STREET: A Multi-Task Structured Reasoning and Explanation Benchmark
D. Ribeiro
Shen Wang
Xiaofei Ma
He Zhu
Rui Dong
...
William Yang Wang
Zhiheng Huang
George Karypis
Bing Xiang
Dan Roth
LRM
ReLM
82
23
0
13 Feb 2023
Can GPT-3 Perform Statutory Reasoning?
Andrew Blair-Stanek
Nils Holzenberger
Benjamin Van Durme
ELM
LRM
116
100
0
13 Feb 2023
Mathematical Capabilities of ChatGPT
Simon Frieder
Luca Pinchetti
Alexis Chevalier
Ryan-Rhys Griffiths
Tommaso Salvatori
Thomas Lukasiewicz
P. Petersen
Julius Berner
ELM
AI4MH
141
433
0
31 Jan 2023
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
Shayne Longpre
Le Hou
Tu Vu
Albert Webson
Hyung Won Chung
...
Denny Zhou
Quoc V. Le
Barret Zoph
Jason W. Wei
Adam Roberts
ALM
122
678
0
31 Jan 2023
LEXTREME: A Multi-Lingual and Multi-Task Benchmark for the Legal Domain
Joel Niklaus
Veton Matoshi
Pooja Rani
Andrea Galassi
Matthias Sturmer
Ilias Chalkidis
ELM
AILaw
100
60
0
30 Jan 2023
REPLUG: Retrieval-Augmented Black-Box Language Models
Weijia Shi
Sewon Min
Michihiro Yasunaga
Minjoon Seo
Rich James
M. Lewis
Luke Zettlemoyer
Wen-tau Yih
RALM
VLM
KELM
266
645
0
30 Jan 2023
ThoughtSource: A central hub for large language model reasoning data
Simon Ott
Konstantin Hebenstreit
Valentin Liévin
C. Hother
M. Moradi
Maximilian Mayrhauser
Robert Praas
Ole Winther
Matthias Samwald
ReLM
LRM
146
46
0
27 Jan 2023
MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding
Steven H. Wang
Antoine Scardigli
Leonard Tang
Wei Chen
D.M. Levkin
Anya Chen
Spencer Ball
Thomas Woodside
Oliver Zhang
Dan Hendrycks
AILaw
ELM
70
22
0
02 Jan 2023
Inconsistencies in Masked Language Models
Tom Young
Yunan Chen
Yang You
74
2
0
30 Dec 2022
Large Language Models Encode Clinical Knowledge
K. Singhal
Shekoofeh Azizi
T. Tu
S. S. Mahdavi
Jason W. Wei
...
A. Rajkomar
Joelle Barral
Christopher Semturs
Alan Karthikesalingam
Vivek Natarajan
LM&MA
ELM
AI4MH
284
2,414
0
26 Dec 2022
Quality at the Tail of Machine Learning Inference
Zhengxin Yang
Wanling Gao
Chunjie Luo
Lei Wang
Fei Tang
Xu Wen
Jianfeng Zhan
76
1
0
25 Dec 2022
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
Srinivasan Iyer
Xi Lin
Ramakanth Pasunuru
Todor Mihaylov
Daniel Simig
...
Jeff Wang
Christopher Dewan
Asli Celikyilmaz
Luke Zettlemoyer
Veselin Stoyanov
ALM
198
268
0
22 Dec 2022
ORCA: A Challenging Benchmark for Arabic Language Understanding
AbdelRahim Elmadany
El Moatez Billah Nagoudi
Muhammad Abdul-Mageed
ELM
109
45
0
21 Dec 2022
A Survey of Deep Learning for Mathematical Reasoning
Pan Lu
Liang Qiu
Wenhao Yu
Sean Welleck
Kai-Wei Chang
ReLM
LRM
133
150
0
20 Dec 2022
Evaluating Human-Language Model Interaction
Mina Lee
Megha Srivastava
Amelia Hardy
John Thickstun
Esin Durmus
...
Hancheng Cao
Tony Lee
Rishi Bommasani
Michael S. Bernstein
Percy Liang
LM&MA
ALM
108
102
0
19 Dec 2022
ALERT: Adapting Language Models to Reasoning Tasks
Ping Yu
Tianlu Wang
O. Yu. Golovneva
Badr AlKhamissi
Siddharth Verma
Zhijing Jin
Gargi Ghosh
Mona T. Diab
Asli Celikyilmaz
ReLM
LRM
85
19
0
16 Dec 2022
On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning
Omar Shaikh
Hongxin Zhang
William B. Held
Michael S. Bernstein
Diyi Yang
ReLM
LRM
162
200
0
15 Dec 2022
Automaton-Based Representations of Task Knowledge from Generative Language Models
Yunhao Yang
Jean-Raphael Gaglione
Cyrus Neary
Ufuk Topcu
118
11
0
04 Dec 2022
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Guangxuan Xiao
Ji Lin
Mickael Seznec
Hao Wu
Julien Demouth
Song Han
MQ
255
844
0
18 Nov 2022
Galactica: A Large Language Model for Science
Ross Taylor
Marcin Kardas
Guillem Cucurull
Thomas Scialom
Anthony Hartshorn
Elvis Saravia
Andrew Poulton
Viktor Kerkez
Robert Stojnic
ELM
ReLM
128
785
0
16 Nov 2022
Calibrated Interpretation: Confidence Estimation in Semantic Parsing
Elias Stengel-Eskin
Benjamin Van Durme
UQLM
160
25
0
14 Nov 2022
Previous
1
2
3
...
66
67
68
69
Next