Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.03300
Cited By
v1
v2
v3 (latest)
Measuring Massive Multitask Language Understanding
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 3,408 papers shown
Title
Inherent Trade-Offs between Diversity and Stability in Multi-Task Benchmarks
Guanhua Zhang
Moritz Hardt
100
11
0
02 May 2024
FLAME: Factuality-Aware Alignment for Large Language Models
Sheng-Chieh Lin
Luyu Gao
Barlas Oğuz
Wenhan Xiong
Jimmy Lin
Wen-tau Yih
Xilun Chen
HILM
95
20
0
02 May 2024
CACTUS: Chemistry Agent Connecting Tool-Usage to Science
Andrew D. McNaughton
Gautham Ramalaxmi
Agustin Kruel
C. Knutson
R. Varikoti
Neeraj Kumar
118
11
0
02 May 2024
WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting
Olly Styles
Sam Miller
Patricio Cerda-Mardini
T. Guha
Victor Sanchez
Bertie Vidgen
LLMAG
86
6
0
01 May 2024
Self-Play Preference Optimization for Language Model Alignment
Yue Wu
Zhiqing Sun
Huizhuo Yuan
Kaixuan Ji
Yiming Yang
Quanquan Gu
147
145
0
01 May 2024
NumLLM: Numeric-Sensitive Large Language Model for Chinese Finance
Huan-Yi Su
Ke Wu
Yu-Hao Huang
Wu-Jun Li
88
1
0
01 May 2024
Harnessing the Power of Multiple Minds: Lessons Learned from LLM Routing
KV Aditya Srivatsa
Kaushal Kumar Maurya
Ekaterina Kochmar
112
17
0
01 May 2024
A Careful Examination of Large Language Model Performance on Grade School Arithmetic
Hugh Zhang
Jeff Da
Dean Lee
Vaughn Robinson
Catherine Wu
...
Qin Lyu
Sean Hendryx
Russell Kaplan
Michele Lunati
Summer Yue
ALM
LRM
ELM
108
110
0
01 May 2024
Extending Llama-3's Context Ten-Fold Overnight
Peitian Zhang
Ninglu Shao
Zheng Liu
Shitao Xiao
Hongjin Qian
Qiwei Ye
Zhicheng Dou
SyDa
51
17
0
30 Apr 2024
Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcom
Shisen Yue
Siyuan Song
Xinyuan Cheng
Hai Hu
105
3
0
30 Apr 2024
Octopus v4: Graph of language models
Wei Chen
Zhiyuan Li
66
7
0
30 Apr 2024
GRAMMAR: Grounded and Modular Methodology for Assessment of Closed-Domain Retrieval-Augmented Language Model
Xinzhe Li
Ming Liu
Shang Gao
RALM
103
0
0
30 Apr 2024
Benchmarking Benchmark Leakage in Large Language Models
Ruijie Xu
Zengzhi Wang
Run-Ze Fan
Pengfei Liu
129
54
0
29 Apr 2024
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
Pat Verga
Sebastian Hofstatter
Sophia Althammer
Yixuan Su
Aleksandra Piktus
Arkady Arkhangorodsky
Minjie Xu
Naomi White
Patrick Lewis
ALM
ELM
140
105
0
29 Apr 2024
PECC: Problem Extraction and Coding Challenges
Patrick Haller
Jonas Golde
Alan Akbik
ReLM
62
6
0
29 Apr 2024
HFT: Half Fine-Tuning for Large Language Models
Tingfeng Hui
Zhenyu Zhang
Shuohuan Wang
Weiran Xu
Yu Sun
Hua Wu
CLL
103
7
0
29 Apr 2024
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Justin Zhao
Timothy Wang
Wael Abid
Geoffrey Angus
Arnav Garg
Jeffery Kinnison
Alex Sherstinsky
Piero Molino
Travis Addair
Devvret Rishi
ALM
117
35
0
29 Apr 2024
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLM
LRM
289
197
0
29 Apr 2024
Mixture-of-Instructions: Aligning Large Language Models via Mixture Prompting
Bowen Xu
Shaoyu Wu
Kai Liu
Lulu Hu
69
1
0
29 Apr 2024
What is Reproducibility in Artificial Intelligence and Machine Learning Research?
Abhyuday Desai
Mohamed Abdelhamid
N. R. Padalkar
AI4CE
81
3
0
29 Apr 2024
BlockLLM: Multi-tenant Finer-grained Serving for Large Language Models
Jiamin Li
Le Xu
Hong-Yu Xu
Aditya Akella
58
2
0
28 Apr 2024
PatentGPT: A Large Language Model for Intellectual Property
Zilong Bai
Ruiji Zhang
Linqing Chen
Qijun Cai
Yuan Zhong
...
Fu Bian
Xiaolong Gu
Lisha Zhang
Weilei Wang
Changyang Tu
105
5
0
28 Apr 2024
CoMM: Collaborative Multi-Agent, Multi-Reasoning-Path Prompting for Complex Problem Solving
Pei Chen
Boran Han
Shuai Zhang
LRM
LLMAG
83
5
0
26 Apr 2024
A Comprehensive Evaluation on Event Reasoning of Large Language Models
Zhengwei Tao
Zhi Jin
Yifan Zhang
Xiancai Chen
Xiaoying Bai
Yue Fang
Haiyan Zhao
Jia Li
Chongyang Tao
LRM
68
4
0
26 Apr 2024
InspectorRAGet: An Introspection Platform for RAG Evaluation
Kshitij P. Fadnis
Siva Sankalp Patel
O. Boni
Yannis Katsis
Sara Rosenthal
Benjamin Sznajder
Marina Danilevsky
53
3
0
26 Apr 2024
Near to Mid-term Risks and Opportunities of Open-Source Generative AI
Francisco Eiras
Aleksandar Petrov
Bertie Vidgen
Christian Schroeder de Witt
Fabio Pizzati
...
Paul Röttger
Philip Torr
Trevor Darrell
Y. Lee
Jakob N. Foerster
117
8
0
25 Apr 2024
Türkçe Dil Modellerinin Performans Karşılaştırması Performance Comparison of Turkish Language Models
Eren Dogan
M. E. Uzun
Atahan Uz
H. E. Seyrek
Ahmed Zeer
Ezgi Sevi
Himmet Toprak Kesgin
M. K. Yuce
M. Amasyalı
ELM
61
0
0
25 Apr 2024
Make Your LLM Fully Utilize the Context
Shengnan An
Zexiong Ma
Zeqi Lin
Nanning Zheng
Jian-Guang Lou
SyDa
156
67
0
25 Apr 2024
Continual Learning of Large Language Models: A Comprehensive Survey
Haizhou Shi
Zihao Xu
Hengyi Wang
Weiyi Qin
Wenyuan Wang
Yibin Wang
Zifeng Wang
Sayna Ebrahimi
Hao Wang
CLL
KELM
LRM
162
88
0
25 Apr 2024
REBEL: Reinforcement Learning via Regressing Relative Rewards
Zhaolin Gao
Jonathan D. Chang
Wenhao Zhan
Owen Oertell
Gokul Swamy
Kianté Brantley
Thorsten Joachims
J. Andrew Bagnell
Jason D. Lee
Wen Sun
OffRL
87
41
0
25 Apr 2024
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
Mostafa Elhoushi
Akshat Shrivastava
Diana Liskovich
Basil Hosmer
Bram Wasti
...
Saurabh Agarwal
Ahmed Roman
Ahmed Aly
Beidi Chen
Carole-Jean Wu
LRM
114
110
0
25 Apr 2024
Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents
Giorgio Piatti
Zhijing Jin
Max Kleiman-Weiner
Bernhard Schölkopf
Mrinmaya Sachan
Rada Mihalcea
LLMAG
111
25
0
25 Apr 2024
Large Language Models in the Clinic: A Comprehensive Benchmark
Andrew Liu
Hongjian Zhou
Yining Hua
Omid Rohanian
Anshul Thakur
Lei A. Clifton
David Clifton
AI4MH
LM&MA
95
10
0
25 Apr 2024
Tele-FLM Technical Report
Xiang Li
Yiqun Yao
Xin Jiang
Xuezhi Fang
Chao Wang
...
Yequan Wang
Zhongjiang He
Zhongyuan Wang
Xuelong Li
Tiejun Huang
81
4
0
25 Apr 2024
Model Extrapolation Expedites Alignment
Chujie Zheng
Ziqi Wang
Heng Ji
Minlie Huang
Nanyun Peng
MoMe
87
33
0
25 Apr 2024
Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach
Linyu Liu
Yu Pan
Xiaocheng Li
Guanting Chen
105
39
0
24 Apr 2024
From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large Language Models
Qi He
Jie Zeng
Qianxi He
Jiaqing Liang
Yanghua Xiao
113
18
0
24 Apr 2024
A Comprehensive Survey on Evaluating Large Language Model Applications in the Medical Industry
Yining Huang
Keke Tang
Meilian Chen
Boyuan Wang
ELM
LM&MA
72
15
0
24 Apr 2024
Retrieval Head Mechanistically Explains Long-Context Factuality
Wenhao Wu
Yizhong Wang
Guangxuan Xiao
Hao-Chun Peng
Yao Fu
LRM
105
84
0
24 Apr 2024
Evaluating Large Language Models for Material Selection
Daniele Grandi
Yash Jain
Allin Groom
Brandon Cramer
Christopher McComb
73
10
0
23 Apr 2024
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts
Yifeng Ding
Jiawei Liu
Yuxiang Wei
Terry Yue Zhuo
Lingming Zhang
ALM
MoE
99
3
0
23 Apr 2024
Does Instruction Tuning Make LLMs More Consistent?
Constanza Fierro
Jiaang Li
Anders Sogaard
LRM
104
2
0
23 Apr 2024
CT-Agent: Clinical Trial Multi-Agent with Large Language Model-based Reasoning
Ling Yue
Tianfan Fu
LLMAG
LRM
ELM
61
7
0
23 Apr 2024
SHED: Shapley-Based Automated Dataset Refinement for Instruction Fine-Tuning
Yexiao He
Ziyao Wang
Zheyu Shen
Guoheng Sun
Yucong Dai
Yongkai Wu
Hongyi Wang
Ang Li
102
13
0
23 Apr 2024
From Matching to Generation: A Survey on Generative Information Retrieval
Xiaoxi Li
Jiajie Jin
Yujia Zhou
Yuyao Zhang
Peitian Zhang
Yutao Zhu
Zhicheng Dou
3DV
215
61
0
23 Apr 2024
Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks
Amir Saeidi
Shivanshu Verma
Chitta Baral
Chitta Baral
ALM
112
26
0
23 Apr 2024
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems
Qihuang Zhong
Kang Wang
Ziyang Xu
Juhua Liu
Liang Ding
Bo Du
LRM
AIMat
166
4
0
23 Apr 2024
OpenELM: An Efficient Language Model Family with Open Training and Inference Framework
Sachin Mehta
Mohammad Hossein Sekhavat
Qingqing Cao
Maxwell Horton
Yanzi Jin
...
Iman Mirzadeh
Mahyar Najibi
Dmitry Belenko
Peter Zatloukal
Mohammad Rastegari
OSLM
AIFin
108
61
0
22 Apr 2024
Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing
Dujian Ding
Ankur Mallick
Chi Wang
Robert Sim
Subhabrata Mukherjee
Victor Rühle
L. Lakshmanan
Ahmed Hassan Awadallah
176
107
0
22 Apr 2024
An empirical study of LLaMA3 quantization: from LLMs to MLLMs
Wei Huang
Xingyu Zheng
Xudong Ma
Haotong Qin
Chengtao Lv
Hong Chen
Jie Luo
Xiaojuan Qi
Xianglong Liu
Michele Magno
MQ
152
42
0
22 Apr 2024
Previous
1
2
3
...
44
45
46
...
67
68
69
Next