Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.03300
Cited By
v1
v2
v3 (latest)
Measuring Massive Multitask Language Understanding
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 3,408 papers shown
Title
Large Language Models are Temporal and Causal Reasoners for Video Question Answering
Dohwan Ko
Ji Soo Lee
Wooyoung Kang
Byungseok Roh
Hyunwoo J. Kim
LRM
103
40
0
24 Oct 2023
KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval
Marah Abdin
Suriya Gunasekar
Varun Chandrasekaran
Jerry Li
Mert Yuksekgonul
Rahee Peshawaria
Ranjita Naik
Besmira Nushi
88
12
0
24 Oct 2023
FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions
Hyunwoo J. Kim
Melanie Sclar
Xuhui Zhou
Ronan Le Bras
Gunhee Kim
Yejin Choi
Maarten Sap
LLMAG
86
92
0
24 Oct 2023
Irreducible Curriculum for Language Model Pretraining
Simin Fan
Martin Jaggi
135
11
0
23 Oct 2023
S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Models
Fangyu Lei
Qian Liu
Yiming Huang
Shizhu He
Jun Zhao
Kang Liu
ELM
LRM
70
13
0
23 Oct 2023
Unveiling A Core Linguistic Region in Large Language Models
Jun Zhao
Zhihao Zhang
Yide Ma
Qi Zhang
Tao Gui
Luhui Gao
Xuanjing Huang
119
6
0
23 Oct 2023
ALCUNA: Large Language Models Meet New Knowledge
Xunjian Yin
Baizhou Huang
Xiaojun Wan
101
27
0
23 Oct 2023
Analyzing Multilingual Competency of LLMs in Multi-Turn Instruction Following: A Case Study of Arabic
Sabri Boughorbel
Majd Hawasly
77
8
0
23 Oct 2023
SuperTweetEval: A Challenging, Unified and Heterogeneous Benchmark for Social Media NLP Research
Dimosthenis Antypas
Asahi Ushio
Francesco Barbieri
Leonardo Neves
Kiamehr Rezaee
Luis Espinosa-Anke
Jiaxin Pei
Jose Camacho-Collados
66
10
0
23 Oct 2023
Establishing Vocabulary Tests as a Benchmark for Evaluating Large Language Models
Gonzalo Martínez
Javier Conde
Elena Merino-Gómez
Beatriz Bermúdez-Margaretto
José Alberto Hernández
Pedro Reviriego
Marc Brysbaert
ELM
55
1
0
23 Oct 2023
Exploring the Boundaries of GPT-4 in Radiology
Qianchu Liu
Stephanie L. Hyland
Shruthi Bannur
Kenza Bouzid
Daniel Coelho De Castro
...
Anja Thieme
A. Nori
M. Lungren
Ozan Oktay
Javier Alvarez-Valle
LM&MA
AI4CE
90
41
0
23 Oct 2023
AlpaCare:Instruction-tuned Large Language Models for Medical Application
Xinlu Zhang
Chenxin Tian
Xianjun Yang
Lichang Chen
Zekun Li
Linda R. Petzold
LM&MA
118
65
0
23 Oct 2023
An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI
Ross Gruetzemacher
Alan Chan
Kevin Frazier
Christy Manning
Stepán Los
...
Clíodhna Ní Ghuidhir
Mark M. Bailey
Daniel Eth
Toby D. Pilditch
Kyle A. Kilian
51
6
0
22 Oct 2023
Language Model Unalignment: Parametric Red-Teaming to Expose Hidden Harms and Biases
Rishabh Bhardwaj
Soujanya Poria
ALM
116
18
0
22 Oct 2023
An In-Context Schema Understanding Method for Knowledge Base Question Answering
Yantao Liu
Zixuan Li
Xiaolong Jin
Yucan Guo
Long Bai
Saiping Guan
Jiafeng Guo
Xueqi Cheng
65
2
0
22 Oct 2023
Orthogonal Subspace Learning for Language Model Continual Learning
Xiao Wang
Tianze Chen
Qiming Ge
Han Xia
Rong Bao
Rui Zheng
Qi Zhang
Tao Gui
Xuanjing Huang
CLL
155
114
0
22 Oct 2023
PromptCBLUE: A Chinese Prompt Tuning Benchmark for the Medical Domain
Wei-wei Zhu
Xiaoling Wang
Huanran Zheng
Mosha Chen
Buzhou Tang
ELM
LM&MA
69
36
0
22 Oct 2023
Revisiting Instruction Fine-tuned Model Evaluation to Guide Industrial Applications
Manuel Faysse
Gautier Viaud
C´eline Hudelot
Pierre Colombo
84
11
0
21 Oct 2023
POSQA: Probe the World Models of LLMs with Size Comparisons
Chang Shu
Paul Burgess
Fangyu Liu
Ehsan Shareghi
Nigel Collier
52
2
0
20 Oct 2023
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
David Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
366
246
0
20 Oct 2023
AgentTuning: Enabling Generalized Agent Abilities for LLMs
Aohan Zeng
Mingdao Liu
Rui Lu
Bowen Wang
Xiao Liu
Yuxiao Dong
Jie Tang
LM&MA
ALM
LLMAG
118
186
0
19 Oct 2023
MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter
Zhiyuan Liu
Changhao Nai
Yancheng Luo
Hao Fei
Yixin Cao
Kenji Kawaguchi
Xiang Wang
Tat-Seng Chua
92
93
0
19 Oct 2023
The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions
Siru Ouyang
Shuohang Wang
Yang Liu
Ming Zhong
Yizhu Jiao
Dan Iter
Reid Pryzant
Chenguang Zhu
Heng Ji
Jiawei Han
98
32
0
19 Oct 2023
A Comprehensive Evaluation of Large Language Models on Legal Judgment Prediction
Ruihao Shui
Yixin Cao
Xiang Wang
Tat-Seng Chua
ELM
AILaw
66
22
0
18 Oct 2023
Investigating Uncertainty Calibration of Aligned Language Models under the Multiple-Choice Setting
Guande He
Peng Cui
Jianfei Chen
Wenbo Hu
Jun Zhu
96
12
0
18 Oct 2023
Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning
Ming Li
Lichang Chen
Jiuhai Chen
Shwai He
Heng-Chiao Huang
Jiuxiang Gu
Dinesh Manocha
152
24
0
18 Oct 2023
Evaluating the Symbol Binding Ability of Large Language Models for Multiple-Choice Questions in Vietnamese General Education
Duc-Vu Nguyen
Quoc-Nam Nguyen
175
6
0
18 Oct 2023
Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective
Ming Zhong
Chenxin An
Weizhu Chen
Jiawei Han
Pengcheng He
101
12
0
17 Oct 2023
NuclearQA: A Human-Made Benchmark for Language Models for the Nuclear Domain
Anurag Acharya
Sai Munikoti
Aaron Hellinger
Sara Smith
S. Wagle
Sameera Horawalavithana
ELM
105
3
0
17 Oct 2023
Unlocking Emergent Modularity in Large Language Models
Zihan Qiu
Zeyu Huang
Jie Fu
89
10
0
17 Oct 2023
BioPlanner: Automatic Evaluation of LLMs on Protocol Planning in Biology
Odhran O'Donoghue
Aleksandar Shtedritski
John Ginger
Ralph Abboud
Ali E. Ghareeb
Justin Booth
Samuel G. Rodriques
93
21
0
16 Oct 2023
Llemma: An Open Language Model For Mathematics
Zhangir Azerbayev
Hailey Schoelkopf
Keiran Paster
Marco Dos Santos
Stephen Marcus McAleer
Albert Q. Jiang
Jia Deng
Stella Biderman
Sean Welleck
CLL
126
303
0
16 Oct 2023
AdaLomo: Low-memory Optimization with Adaptive Learning Rate
Kai Lv
Hang Yan
Qipeng Guo
Haijun Lv
Xipeng Qiu
ODL
88
23
0
16 Oct 2023
Improving Large Language Model Fine-tuning for Solving Math Problems
Yixin Liu
Avi Singh
C. D. Freeman
John D. Co-Reyes
Peter J. Liu
LRM
ReLM
86
50
0
16 Oct 2023
ACES: Generating Diverse Programming Puzzles with with Autotelic Generative Models
Julien Pourcel
Cédric Colas
Gaia Molinaro
Pierre-Yves Oudeyer
Laetitia Teodorescu
97
3
0
15 Oct 2023
KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language Models
Yuyang Bai
Shangbin Feng
Vidhisha Balachandran
Zhaoxuan Tan
Shiqi Lou
Tianxing He
Yulia Tsvetkov
ELM
95
3
0
15 Oct 2023
Can Large Language Model Comprehend Ancient Chinese? A Preliminary Test on ACLUE
Yixuan Zhang
Haonan Li
LRM
ELM
44
11
0
14 Oct 2023
Instruction Tuning with Human Curriculum
Bruce W. Lee
Hyunsoo Cho
Kang Min Yoo
87
4
0
14 Oct 2023
Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Exploration
Fanqi Wan
Xinting Huang
Tao Yang
Xiaojun Quan
Wei Bi
Shuming Shi
ALM
91
21
0
13 Oct 2023
The Consensus Game: Language Model Generation via Equilibrium Search
Athul Paul Jacob
Songlin Yang
Gabriele Farina
Jacob Andreas
93
23
0
13 Oct 2023
Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model
Qichen Ye
Junling Liu
Dading Chong
Peilin Zhou
Yining Hua
...
Meng Cao
Ziming Wang
Xuxin Cheng
Andrew Liu
Zhenhua Guo
AI4MH
LM&MA
ELM
89
22
0
13 Oct 2023
InstructTODS: Large Language Models for End-to-End Task-Oriented Dialogue Systems
Willy Chung
Samuel Cahyawijaya
Bryan Wilie
Holy Lovenia
Pascale Fung
80
6
0
13 Oct 2023
Impact of Guidance and Interaction Strategies for LLM Use on Learner Performance and Perception
Harsh Kumar
Ilya Musabirov
Mohi Reza
Jiakai Shi
Xinyuan Wang
Joseph Jay Williams
Anastasia Kuzminykh
Michael Liut
68
32
0
13 Oct 2023
GLoRE: Evaluating Logical Reasoning of Large Language Models
Hanmeng Liu
Zhiyang Teng
Ruoxi Ning
Jian Liu
Qiji Zhou
Yuexin Zhang
Yue Zhang
ReLM
ELM
LRM
164
8
0
13 Oct 2023
A Confederacy of Models: a Comprehensive Evaluation of LLMs on Creative Writing
Carlos Gómez-Rodríguez
Paul Williams
88
83
0
12 Oct 2023
Exploring the Cognitive Knowledge Structure of Large Language Models: An Educational Diagnostic Assessment Approach
Zheyuan Zhang
Jifan Yu
Juanzi Li
Lei Hou
AI4Ed
51
3
0
12 Oct 2023
GameGPT: Multi-agent Collaborative Framework for Game Development
Dake Chen
Hanbin Wang
Yunhao Huo
Yuzhao Li
Haoyang Zhang
LLMAG
94
19
0
12 Oct 2023
Understanding and Controlling a Maze-Solving Policy Network
Ulisse Mini
Peli Grietzer
Mrinank Sharma
Austin Meek
M. MacDiarmid
Alexander Matt Turner
51
18
0
12 Oct 2023
Large Language Models Are Zero-Shot Time Series Forecasters
Nate Gruver
Marc Finzi
Shikai Qiu
Andrew Gordon Wilson
AI4TS
97
375
0
11 Oct 2023
Composite Backdoor Attacks Against Large Language Models
Hai Huang
Zhengyu Zhao
Michael Backes
Yun Shen
Yang Zhang
AAML
80
49
0
11 Oct 2023
Previous
1
2
3
...
59
60
61
...
67
68
69
Next