Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.03300
Cited By
v1
v2
v3 (latest)
Measuring Massive Multitask Language Understanding
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 3,408 papers shown
Title
Shared Imagination: LLMs Hallucinate Alike
Yilun Zhou
Caiming Xiong
Silvio Savarese
Chien-Sheng Wu
HILM
60
2
0
23 Jul 2024
Enhancing LLM's Cognition via Structurization
Kai-Chun Liu
Zhihang Fu
Chao Chen
Wei Zhang
Rongxin Jiang
Fan Zhou
Yao-Shen Chen
Yue-bo Wu
Jieping Ye
83
1
0
23 Jul 2024
A deeper look at depth pruning of LLMs
Shoaib Ahmed Siddiqui
Xin Dong
Greg Heinrich
Thomas Breuel
Jan Kautz
David M. Krueger
Pavlo Molchanov
74
11
0
23 Jul 2024
Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models
Nishanth Madhusudhan
Sathwik Tejaswi Madhusudhan
Vikas Yadav
Masoud Hashemi
116
11
0
23 Jul 2024
A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More
Zhichao Wang
Bin Bi
Shiva K. Pentyala
Kiran Ramnath
Sougata Chaudhuri
...
Z. Zhu
Xiang-Bo Mao
S. Asur
Na
Na Cheng
OffRL
97
58
0
23 Jul 2024
DDK: Distilling Domain Knowledge for Efficient Large Language Models
Jiaheng Liu
Chenchen Zhang
Jinyang Guo
Yuanxing Zhang
Haoran Que
...
Congnan Liu
Wenbo Su
Jiamang Wang
Lin Qu
Bo Zheng
110
6
0
23 Jul 2024
Benchmarks as Microscopes: A Call for Model Metrology
Michael Stephen Saxon
Ari Holtzman
Peter West
William Y. Wang
Naomi Saphra
114
13
0
22 Jul 2024
AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?
Ori Yoran
S. Amouyal
Chaitanya Malaviya
Ben Bogin
Ofir Press
Jonathan Berant
LLMAG
109
44
0
22 Jul 2024
Attention Is All You Need But You Don't Need All Of It For Inference of Large Language Models
Georgy Tyukin
G. Dovonon
Jean Kaddour
Pasquale Minervini
LRM
70
2
0
22 Jul 2024
ALLaM: Large Language Models for Arabic and English
M Saiful Bari
Yazeed Alnumay
Norah A. Alzahrani
Nouf M. Alotaibi
H. A. Alyahya
...
Jeril Kuriakose
Abdalghani Abujabal
Nora Al-Twairesh
Areeb Alowisheq
Haidar Khan
77
17
0
22 Jul 2024
ZZU-NLP at SIGHAN-2024 dimABSA Task: Aspect-Based Sentiment Analysis with Coarse-to-Fine In-context Learning
Senbin Zhu
Hanjie Zhao
Xingren Wang
Shanhong Liu
Yuxiang Jia
Hongying Zan
74
2
0
22 Jul 2024
Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners
Yifei Gao
Jie Ou
Lei Wang
Fanhua Shang
Jaji Wu
MQ
111
0
0
22 Jul 2024
VideoGameBunny: Towards vision assistants for video games
Mohammad Reza Taesiri
Cor-Paul Bezemer
VLM
MLLM
81
2
0
21 Jul 2024
Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions
Sarah Wiegreffe
Oyvind Tafjord
Yonatan Belinkov
Hanna Hajishirzi
Ashish Sabharwal
94
9
0
21 Jul 2024
Compact Language Models via Pruning and Knowledge Distillation
Saurav Muralidharan
Sharath Turuvekere Sreenivas
Raviraj Joshi
Marcin Chochowski
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
Jan Kautz
Pavlo Molchanov
SyDa
MQ
108
57
0
19 Jul 2024
Internal Consistency and Self-Feedback in Large Language Models: A Survey
Xun Liang
Shichao Song
Zifan Zheng
Hanyu Wang
Qingchen Yu
...
Rong-Hua Li
Peng Cheng
Zhonghao Wang
Feiyu Xiong
Zhiyu Li
HILM
LRM
162
30
0
19 Jul 2024
LLMs left, right, and center: Assessing GPT's capabilities to label political bias from web domains
Raphael Hernandes
50
5
0
19 Jul 2024
Werewolf Arena: A Case Study in LLM Evaluation via Social Deduction
Suma Bailis
Jane Friedhoff
Feiyang Chen
86
9
0
18 Jul 2024
Thought-Like-Pro: Enhancing Reasoning of Large Language Models through Self-Driven Prolog-based Chain-of-Thought
Jue Chen
Yongxin Deng
Xihe Qiu
Weidi Xu
Chao Qu
Wei Chu
Yinghui Xu
Yuan Qi
LRM
AI4CE
LM&Ro
96
3
0
18 Jul 2024
LLMs as Function Approximators: Terminology, Taxonomy, and Questions for Evaluation
David Schlangen
78
1
0
18 Jul 2024
MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains
Guoli Yin
Haoping Bai
Shuang Ma
Feng Nan
Yanchao Sun
...
Xiaoming Wang
Jiulong Shan
Meng Cao
Ruoming Pang
Zirui Wang
LLMAG
ELM
79
7
0
18 Jul 2024
Questionable practices in machine learning
Gavin Leech
Juan J. Vazquez
Misha Yagudin
Niclas Kupper
Laurence Aitchison
110
6
0
17 Jul 2024
MINI-LLM: Memory-Efficient Structured Pruning for Large Language Models
Hongrong Cheng
Miao Zhang
J. Q. Shi
107
3
0
16 Jul 2024
LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices
Jung Hyun Lee
Jeonghoon Kim
J. Yang
S. Kwon
Eunho Yang
Kang Min Yoo
Dongsoo Lee
MQ
141
3
0
16 Jul 2024
Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models
Qingcheng Zeng
Mingyu Jin
Qinkai Yu
Zhenting Wang
Wenyue Hua
...
Felix Juefei Xu
Kaize Ding
Fan Yang
Ruixiang Tang
Yongfeng Zhang
AAML
99
11
0
15 Jul 2024
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
Hongyu Wang
Shuming Ma
Ruiping Wang
Furu Wei
MoE
88
13
0
15 Jul 2024
Understanding the Importance of Evolutionary Search in Automated Heuristic Design with Large Language Models
Rui Zhang
Fei Liu
Xi Lin
Zhenkun Wang
Zhichao Lu
Qingfu Zhang
57
9
0
15 Jul 2024
Mix-CPT: A Domain Adaptation Framework via Decoupling Knowledge Learning and Format Alignment
Jinhao Jiang
Junyi Li
Wayne Xin Zhao
Yang Song
Tao Zhang
Ji-Rong Wen
CLL
92
3
0
15 Jul 2024
Qwen2 Technical Report
An Yang
Baosong Yang
Binyuan Hui
Jian Xu
Bowen Yu
...
Yuqiong Liu
Zeyu Cui
Zhenru Zhang
Zhifang Guo
Zhi-Wei Fan
OSLM
VLM
MU
239
989
0
15 Jul 2024
Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique
M. Russinovich
Ahmed Salem
156
13
0
15 Jul 2024
LAB-Bench: Measuring Capabilities of Language Models for Biology Research
Jon M. Laurent
Joseph D. Janizek
Michael Ruzo
Michaela M. Hinks
M. Hammerling
Siddharth Narayanan
Manvitha Ponnapati
Andrew D. White
Samuel G. Rodriques
ELM
103
55
0
14 Jul 2024
Look Within, Why LLMs Hallucinate: A Causal Perspective
He Li
Haoang Chi
Mingyu Liu
Wenjing Yang
LRM
73
6
0
14 Jul 2024
Revolutionizing Bridge Operation and maintenance with LLM-based Agents: An Overview of Applications and Insights
Xinyu-Chen
Lianzhen-Zhang
LLMAG
AI4CE
110
4
0
14 Jul 2024
Bilingual Adaptation of Monolingual Foundation Models
Gurpreet Gosal
Yishi Xu
Gokul Ramakrishnan
Rituraj Joshi
Avraham Sheinin
...
Rahul Pal
Parvez Mullah
Soundar Doraiswamy
Mohamed El Karim Chami
Preslav Nakov
CLL
108
3
0
13 Jul 2024
Beyond KV Caching: Shared Attention for Efficient LLMs
Bingli Liao
Danilo Vasconcellos Vargas
63
5
0
13 Jul 2024
NativQA: Multilingual Culturally-Aligned Natural Query for LLMs
Md. Arid Hasan
Maram Hasanain
Fatema Ahmad
Sahinur Rahman Laskar
Sunaya Upadhyay
Vrunda N. Sukhadia
Mucahid Kutlu
Shammur A. Chowdhury
Firoj Alam
171
7
0
13 Jul 2024
MSEval: A Dataset for Material Selection in Conceptual Design to Evaluate Algorithmic Models
Yash Jain
Daniele Grandi
Allin Groom
Brandon Cramer
Christopher McComb
67
0
0
12 Jul 2024
GRAD-SUM: Leveraging Gradient Summarization for Optimal Prompt Engineering
Derek Austin
Elliott Chartock
52
5
0
12 Jul 2024
Accuracy is Not All You Need
Abhinav Dutta
Sanjeev Krishnan
Nipun Kwatra
Ramachandran Ramjee
103
4
0
12 Jul 2024
Self-Prompt Tuning: Enable Autonomous Role-Playing in LLMs
Aobo Kong
Shiwan Zhao
Hao Chen
Qicheng Li
Yong Qin
Ruiqi Sun
Xin Zhou
Jiaming Zhou
Haoqin Sun
100
12
0
12 Jul 2024
Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation
Biqing Qi
Kaiyan Zhang
Kai Tian
Haoxiang Li
Zhang-Ren Chen
Sihang Zeng
Ermo Hua
Hu Jinfang
Bowen Zhou
LM&MA
127
18
0
12 Jul 2024
Self-Evolving GPT: A Lifelong Autonomous Experiential Learner
Jinglong Gao
Xiao Ding
Yiming Cui
Jianbai Zhao
Hepeng Wang
Ting Liu
Bing Qin
KELM
CLL
119
8
0
12 Jul 2024
Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts
Zeliang Zhang
Xiaodong Liu
Hao Cheng
Chenliang Xu
Jianfeng Gao
MoE
163
11
0
12 Jul 2024
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
Youliang Yuan
Wenxiang Jiao
Wenxuan Wang
Jen-tse Huang
Jiahao Xu
Tian Liang
Pinjia He
Zhaopeng Tu
117
32
0
12 Jul 2024
Mitigating Catastrophic Forgetting in Language Transfer via Model Merging
Anton Alexandrov
Veselin Raychev
Mark Niklas Muller
Ce Zhang
Martin Vechev
Kristina Toutanova
MoMe
CLL
KELM
127
20
0
11 Jul 2024
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients
Zhenyu Zhang
Ajay Jaiswal
L. Yin
Shiwei Liu
Jiawei Zhao
Yuandong Tian
Zhangyang Wang
VLM
77
23
0
11 Jul 2024
SoupLM: Model Integration in Large Language and Multi-Modal Models
Yue Bai
Zichen Zhang
Jiasen Lu
Yun Fu
MoMe
62
1
0
11 Jul 2024
AutoBencher: Towards Declarative Benchmark Construction
Xiang Lisa Li
Emmy Liu
Percy Liang
Tatsunori Hashimoto
Percy Liang
Tatsunori Hashimoto
92
9
0
11 Jul 2024
Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing
Huanqian Wang
Yang Yue
Rui Lu
Jingxin Shi
Andrew Zhao
Shenzhi Wang
Shiji Song
Gao Huang
LM&Ro
KELM
143
0
0
11 Jul 2024
RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization
Xijie Huang
Zechun Liu
Shih-yang Liu
Kwang-Ting Cheng
MQ
91
9
0
10 Jul 2024
Previous
1
2
3
...
34
35
36
...
67
68
69
Next